date:20220114

On Thu, Jan 13, 2022 at 12:43 PM Peter Maydell 
wrote:

> On Sun, 9 Jan 2022 at 16:41, Warner Losh  wrote:
> >
> > Implement conversion of host to target siginfo.
> >
> > Signed-off-by: Stacey Son 
> > Signed-off-by: Kyle Evans 
> > Signed-off-by: Warner Losh 
> > ---
> >  bsd-user/signal.c | 37 +
> >  1 file changed, 37 insertions(+)
> >
> > diff --git a/bsd-user/signal.c b/bsd-user/signal.c
> > index 7168d851be8..3fe8b2d9898 100644
> > --- a/bsd-user/signal.c
> > +++ b/bsd-user/signal.c
> > @@ -43,6 +43,43 @@ int target_to_host_signal(int sig)
> >  return sig;
> >  }
> >
> > +/* Siginfo conversion. */
> > +static inline void host_to_target_siginfo_noswap(target_siginfo_t
> *tinfo,
> > +const siginfo_t *info)
> > +{
> > +int sig, code;
> > +
> > +sig = host_to_target_signal(info->si_signo);
> > +/* XXX should have host_to_target_si_code() */
> > +code = tswap32(info->si_code);
> > +tinfo->si_signo = sig;
> > +tinfo->si_errno = info->si_errno;
> > +tinfo->si_code = info->si_code;
> > +tinfo->si_pid = info->si_pid;
> > +tinfo->si_uid = info->si_uid;
> > +tinfo->si_status = info->si_status;
> > +tinfo->si_addr = (abi_ulong)(unsigned long)info->si_addr;
> > +/* si_value is opaque to kernel */
> > +tinfo->si_value.sival_ptr =
> > +(abi_ulong)(unsigned long)info->si_value.sival_ptr;
> > +if (SIGILL == sig || SIGFPE == sig || SIGSEGV == sig || SIGBUS ==
> sig ||
>
> Don't use yoda-conditions, please. sig == SIGILL, etc.
>
> > +SIGTRAP == sig) {
> > +tinfo->_reason._fault._trapno = info->_reason._fault._trapno;
> > +}
> > +#ifdef SIGPOLL
> > +if (SIGPOLL == sig) {
> > +tinfo->_reason._poll._band = info->_reason._poll._band;
> > +}
> > +#endif
> > +if (SI_TIMER == code) {
> > +int timerid;
> > +
> > +timerid = info->_reason._timer._timerid;
> > +tinfo->_reason._timer._timerid = timerid;
> > +tinfo->_reason._timer._overrun = info->_reason._timer._overrun;
> > +}
> > +}
>
> I think this will only compile on FreeBSD (the other BSDs having
> notably different target_siginfo_t structs); I guess we're OK
> with that ?
>

Yes. bsd-user fork does not compile on the other BSDs. There's too many
things missing, and too few places where specific code is in place. I'm
thinking
that it won't be possible to implement running NetBSD binaries on FreeBSD
or vice versa since they are so different. OpenBSD and NetBSD are a lot
closer to each other, with fewer critical differences in areas like
threads, so
it may be possible there. There's a lot of rework that's needed in this area
to take what's even in bsd-user fork and make it work on NetBSD or
OpenBSD, exactly for reasons like this. I've been ignoring the elephant
in the room for a while now, ever since I realized this fundamental
shift.

> I also commented on the general setup linux-user has for this
> function back in patch 2; I'll let you figure out whether what
> you have here is the right thing for BSD.
>

Yea. I'm still thinking through what you said there (and elsewhere). These
issues may be the root cause of some regressions in arm binaries between
6.1 and 6.2 as I tried to adopt the sigsegv/sigbus changes.

I need to work through those things in our development branch before trying
to fold them into this series. And I'm not yet sure the right way to do
that because
many of the things are likely to be largish changes that may be tough to
manage
keeping this patch series in sync. So I'm going to do all the trivial style
and
tiny bug things first, then tackle this more fundamental issue. I've thought
about it enough to understand that the code in this patch series has some
conceptual mistakes that must be addressed. Having this very detailed
feedback
is quite helpful in laying out the path for me to fix these issues (even if
I don't
ultimately do everything like linux-user, I'll know why it's different
rather than
the current situation where there's much inherited code and the best answer
I could give is 'well linux-user was like that 5 years ago and we needed to
make
these hacks to make things work' which is completely unsatisfying to give
and
to hear.

Warner

RE: hexagon container update

2022-01-14 Thread Brian Cain

> -Original Message-
> From: Brian Cain
> Sent: Friday, October 1, 2021 7:23 PM
> To: Richard Henderson ; Alex Bennée
> ; qemu-devel@nongnu.org
> Cc: Taylor Simpson 
> Subject: RE: hexagon container update
> 
> > -Original Message-
> > From: Brian Cain
> ...
> > > -Original Message-
> > > From: Richard Henderson 
> > ...
> > > On 10/1/21 12:59 PM, Brian Cain wrote:
> > > > Alex,
> > > >
> > > > We need to update the docker container used for hexagon for new test
> > cases
> > > proposed in Taylor's recent patch series under review.  Thankfully,
> > CodeLinaro
> > > has provided a binary of the hexagon cross toolchain so now I think we can
> > > simplify the hexagon docker file to something like the below.  I hope this
> also
> > > means that we can remove the exceptional handling for the hexagon
> > container.

We had some issues with the previous attempt to update the container.  The 
linux-user "signals" test failed.  Richard pointed out that the archive of the 
C library had what looks like a defect that would cause the test to fail.

https://www.mail-archive.com/qemu-devel@nongnu.org/msg849635.html

I'm following up now with a workaround - the attached patch references a 
toolchain which avoids the problem and passes the signals test.  This toolchain 
is based on llvm+clang 13.0.1-rc2.  BTW, the release page has a signature 
provided in case anyone wants to verify the download: 
https://github.com/quic/toolchain_for_hexagon/releases/tag/13.0.1-rc2_

Can we try again with this new container definition?

-Brian

0001-Update-Hexagon-toolchain-to-13.0.1-rc2.patch
Description: 0001-Update-Hexagon-toolchain-to-13.0.1-rc2.patch

Re: [PATCH v1 1/2] decodetree: Add an optional predicate-function for decoding

2022-01-14 Thread Alistair Francis

On Thu, Jan 13, 2022 at 6:28 PM Philipp Tomsich
 wrote:
>
> On Thu, 13 Jan 2022 at 06:07, Alistair Francis  wrote:
> >
> > On Thu, Jan 13, 2022 at 1:42 AM Philipp Tomsich
> >  wrote:
> > >
> > > Alistair,
> > >
> > > Do you (and the other RISC-V custodians of target/riscv) have any opinion 
> > > on this?
> > > We can go either way — and it boils down to a style and process question.
> >
> > Sorry, it's a busy week!
> >
> > I had a quick look over this series and left some comments below.
>
>
> Thank you for taking the time despite the busy week — I can absolutely
> relate, as it seems that January is picking up right where December
> left off ;-)
>
> >
> > >
> > > Thanks,
> > > Philipp.
> > >
> > > On Mon, 10 Jan 2022 at 12:30, Philippe Mathieu-Daudé  
> > > wrote:
> > >>
> > >> On 1/10/22 12:11, Philipp Tomsich wrote:
> > >> > On Mon, 10 Jan 2022 at 11:03, Philippe Mathieu-Daudé  > >> > > wrote:
> > >> >
> > >> > On 1/10/22 10:52, Philipp Tomsich wrote:
> > >> > > For RISC-V the opcode decode will change between different vendor
> > >> > > implementations of RISC-V (emulated by the same qemu binary).
> > >> > > Any two vendors may reuse the same opcode space, e.g., we may end
> > >> > up with:
> > >> > >
> > >> > > # *** RV64 Custom-3 Extension ***
> > >> > > {
> > >> > >   vt_maskc   000  . . 110 . 011 @r
> > >> > |has_xventanacondops_p
> > >> > >   vt_maskcn  000  . . 111 . 011 @r
> > >> > |has_xventanacondops_p
> > >> > >   someone_something   . 000 . 1100111 @i
> > >> > > |has_xsomeonesomething_p
> > >> > > }
> >
> > I don't love this. If even a few vendors use this we could have a huge
> > number of instructions here
> >
> > >> > >
> > >> > > With extensions being enabled either from the commandline
> > >> > > -cpu any,xventanacondops=true
> > >> > > or possibly even having a AMP in one emulation setup (e.g. 
> > >> > application
> > >> > > cores having one extension and power-mangement cores having a
> > >> > > different one — or even a conflicting one).
> >
> > Agreed, an AMP configuration is entirely possible.
> >
> > >> >
> > >> > I understand, I think this is what MIPS does, see commit 
> > >> > 9d005392390:
> > >> > ("target/mips: Introduce decodetree structure for NEC Vr54xx 
> > >> > extension")
> > >> >
> > >> >
> > >> > The MIPS implementation is functionally equivalent, and I could see us
> > >> > doing something similar for RISC-V (although I would strongly prefer to
> > >> > make everything explicit via the .decode-file instead of relying on
> > >> > people being aware of the logic in decode_op).
> > >> >
> > >> > With the growing number of optional extensions (as of today, at least
> > >> > the Zb[abcs] and vector comes to mind), we would end up with a large
> > >> > number of decode-files that will then need to be sequentially called
> > >> > from decode_op(). The predicates can then move up into decode_op,
> > >> > predicting the auto-generated decoders from being invoked.
> > >> >
> > >> > As of today, we have predicates for at least the following:
> > >> >
> > >> >   * Zb[abcs]
> > >> >   * Vectors
> >
> > I see your point, having a long list of decode_*() functions to call
> > is a hassle. On the other hand having thousands of lines in
> > insn32.decode is also a pain.
> >
> > In saying that, having official RISC-V extensions in insn32.decode and
> > vendor instructions in insn.decode seems like a reasonable
> > compromise. Maybe even large extensions (vector maybe?) could have
> > their own insn.decode file, on a case by case basis.
> >
> > >> >
> > >> > As long as we are in greenfield territory (i.e. not dealing with
> > >> > HINT-instructions that overlap existing opcode space), this will be 
> > >> > fine
> > >> > and provide proper isolation between the .decode-tables.
> > >> > However, as soon as we need to implement something along the lines (I
> > >> > know this is a bad example, as prefetching will be a no-op on qemu) of:
> > >> >
> > >> > {
> > >> >   {
> > >> > # *** RV32 Zicbop Sandard Extension (hints in the ori-space) 
> > >> > ***
> > >> > prefetch_i  ... 0 . 110 0 0010011 @cbo_pref
> > >> > prefetch_r  ... 1 . 110 0 0010011 @cbo_pref
> > >> > prefetch_w  ... 00011 . 110 0 0010011 @cbo_pref
> > >> >   }
> > >> >   ori   . 110 . 0010011 @i
> > >> > }
> > >> >
> > >> > we'd need to make sure that the generated decoders are called in the
> > >> > appropriate order (i.e. the decoder for the specialized instructions
> > >> > will need to be called first), which would not be apparent from looking
> > >> > at the individual .decode files.
> > >> >
> > >> > Let me know what direction we want to take (of course, I have a bias
> > >> > towards the one in the patch).
> > >>
> >

Re: [PATCH 2/8] target/ppc: 405: Add missing exception handlers

2022-01-14 Thread Fabiano Rosas

David Gibson  writes:

> On Mon, Jan 10, 2022 at 03:15:40PM -0300, Fabiano Rosas wrote:
>> Signed-off-by: Fabiano Rosas 
>> ---
>>  target/ppc/cpu_init.c | 2 ++
>>  1 file changed, 2 insertions(+)
>> 
>> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
>> index a50ddaeaae..9097948e67 100644
>> --- a/target/ppc/cpu_init.c
>> +++ b/target/ppc/cpu_init.c
>> @@ -1951,7 +1951,9 @@ static void init_excp_4xx_softmmu(CPUPPCState *env)
>>  env->excp_vectors[POWERPC_EXCP_EXTERNAL] = 0x0500;
>>  env->excp_vectors[POWERPC_EXCP_ALIGN]= 0x0600;
>>  env->excp_vectors[POWERPC_EXCP_PROGRAM]  = 0x0700;
>> +env->excp_vectors[POWERPC_EXCP_FPU]  = 0x0800;
>
> I have a vague recollection from my days of working on 405 that there
> may have been something funky with FP emulation on there: e.g. FP
> instructions causing 0x700 program interrupts instead of FP unavailble
> interrupts or something.

Maybe this (from the manual):

  Program - causing conditions:

  Attempted execution of illegal instructions, TRAP instruction,
  privileged instruction in problem state, or auxiliary processor (APU)
  instruction, or unimplemented FPU instruction, or unimplemented APU
  instruction, or APU interrupt, or FPU interrupt

  FPU Unavailable - causing conditions:

  Attempted execution of an FPU instruction when MSR[FP]=0.

There's also this bit:

  Attempted execution of an APU instruction while the APUc405exception
  signal is asserted) results in a program interrupt. Similarly, attempted
  execution of an FPU instruction whilethe FPUc405exception signal is
  asserted) also results in a program interrupt. The following also result
  in program interrupts: attempted execution of an APU instruction while
  APUc405DcdAPUOp is asserted but APUC405DcdValidOp is deasserted; and
  attempted execution of an FPU instruction while APUc405DcdFpuOp but
  APUC405DcdValidOp is deasserted.

> I might be remembering incorrectly - the manual does seem to imply
> that 0x800 FP unavailable is there as normal, but it might be worth
> double checking this (against real hardware, if possible).

The Linux kernel has the vectors that I'm adding disabled:

  EXCEPTION(0x0800, Trap_08, unknown_exception) <-- FPU
  EXCEPTION(0x0900, Trap_09, unknown_exception)
  EXCEPTION(0x0A00, Trap_0A, unknown_exception) 
  EXCEPTION(0x0B00, Trap_0B, unknown_exception)
  ...
  EXCEPTION(0x0F00, Trap_0F, unknown_exception) <-- APU

(0xf20 would probably cause a crash as we'd jump to the middle of the
exception prologue)

Maybe I should drop this patch then? That way future developers won't
feel tempted to raise one of these.

It seems mostly inconsequential either way, what do you think?

Re: [PATCH v2 3/9] fixup: force interp off for QEMU machine 6.2 and older


On 1/14/22 3:38 PM, Matthew Rosato wrote:

Double-check I'm doing this right + test.



Argh...  This should have been squashed into the preceding patch 
'target/s390x: add zpci-interp to cpu models'



Signed-off-by: Matthew Rosato 
---
  hw/s390x/s390-virtio-ccw.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 84e3e63c43..e02fe11b07 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -803,6 +803,7 @@ DEFINE_CCW_MACHINE(7_0, "7.0", true);
  static void ccw_machine_6_2_instance_options(MachineState *machine)
  {
  ccw_machine_7_0_instance_options(machine);
+s390_cpudef_featoff_greater(14, 1, S390_FEAT_ZPCI_INTERP);
  }
  
  static void ccw_machine_6_2_class_options(MachineClass *mc)

Re: [PATCH 1/2] accel/tcg: Optimize jump cache flush during tlb range flush

2022-01-14 Thread Idan Horowitz

Idan Horowitz  wrote:
>
> cbnz x9, 0x5168abc8
>

I forgot to include the addresses of the instructions, making this
jump undecipherable, here's the snippet again but with addresses this
time:

0x5168abb0 movkx0, #0x0
0x5168abb4 movkx0, #0x0, lsl #16
0x5168abb8 movkx0, #0xff80, lsl #32
0x5168abbc movkx0, #0x0, lsl #48
0x5168abc0 mov x9, #0x64
0x5168abc4 str x9, [x8]
0x5168abc8 tlbirvae1, x0
0x5168abcc ldr x9, [x8]
0x5168abd0 sub x9, x9, #0x1
0x5168abd4 str x9, [x8]
0x5168abd8 cbnzx9, 0x5168abc8

>
> Idan Horowitz

Idan Horowitz

Re: [PATCH] linux-user: Fix comment typo in arm cpu_loop code

On Fri, Jan 14, 2022 at 11:25 AM Peter Maydell 
wrote:

> Fix a typo in a comment in the arm cpu_loop code.
>
> Signed-off-by: Peter Maydell 
> ---
>  linux-user/arm/cpu_loop.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>

Reviewed-by: Warner Losh 


> diff --git a/linux-user/arm/cpu_loop.c b/linux-user/arm/cpu_loop.c
> index f153ab503a8..032e1ffddfb 100644
> --- a/linux-user/arm/cpu_loop.c
> +++ b/linux-user/arm/cpu_loop.c
> @@ -434,8 +434,8 @@ void cpu_loop(CPUARMState *env)
>  case 0x6: /* Access flag fault, level 2 */
>  case 0x9: /* Domain fault, level 1 */
>  case 0xb: /* Domain fault, level 2 */
> -case 0xd: /* Permision fault, level 1 */
> -case 0xf: /* Permision fault, level 2 */
> +case 0xd: /* Permission fault, level 1 */
> +case 0xf: /* Permission fault, level 2 */
>  si_signo = TARGET_SIGSEGV;
>  si_code = TARGET_SEGV_ACCERR;
>  break;
> --
> 2.25.1
>
>

[PATCH v2 6/9] s390x/pci: enable adapter event notification for interpreted devices

Use the associated vfio feature ioctl to enable adapter event notification
and forwarding for devices when requested.  This feature will be set up
with or without firmware assist based upon the 'intassist' setting.

Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-bus.c  | 24 --
 hw/s390x/s390-pci-inst.c | 54 +-
 hw/s390x/s390-pci-vfio.c | 79 
 include/hw/s390x/s390-pci-bus.h  |  1 +
 include/hw/s390x/s390-pci-vfio.h | 20 
 5 files changed, 173 insertions(+), 5 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 66649af6e0..6ee70446ca 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -189,7 +189,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
 rc = SCLP_RC_NO_ACTION_REQUIRED;
 break;
 default:
-if (pbdev->summary_ind) {
+if (pbdev->interp) {
+/* Interpreted devices were using interrupt forwarding */
+s390_pci_set_aif(pbdev, NULL, false, pbdev->intassist);
+} else if (pbdev->summary_ind) {
 pci_dereg_irqs(pbdev);
 }
 if (pbdev->iommu->enabled) {
@@ -981,6 +984,11 @@ static int s390_pci_interp_plug(S390pciState *s, 
S390PCIBusDevice *pbdev)
 return rc;
 }
 
+rc = s390_pci_probe_aif(pbdev);
+if (rc) {
+return rc;
+}
+
 rc = s390_pci_update_passthrough_fh(pbdev);
 if (rc) {
 return rc;
@@ -1076,6 +1084,7 @@ static void s390_pcihost_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 if (pbdev->interp && !s390_has_feat(S390_FEAT_ZPCI_INTERP)) {
 DPRINTF("zPCI interpretation facilities missing.\n");
 pbdev->interp = false;
+pbdev->intassist = false;
 }
 if (pbdev->interp) {
 rc = s390_pci_interp_plug(s, pbdev);
@@ -1090,11 +1099,13 @@ static void s390_pcihost_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 if (!pbdev->interp) {
 /* Do vfio passthrough but intercept for I/O */
 pbdev->fh |= FH_SHM_VFIO;
+pbdev->intassist = false;
 }
 } else {
 pbdev->fh |= FH_SHM_EMUL;
 /* Always intercept emulated devices */
 pbdev->interp = false;
+pbdev->intassist = false;
 }
 
 if (s390_pci_msix_init(pbdev) && !pbdev->interp) {
@@ -1244,7 +1255,10 @@ static void s390_pcihost_reset(DeviceState *dev)
 /* Process all pending unplug requests */
 QTAILQ_FOREACH_SAFE(pbdev, &s->zpci_devs, link, next) {
 if (pbdev->unplug_requested) {
-if (pbdev->summary_ind) {
+if (pbdev->interp) {
+/* Interpreted devices were using interrupt forwarding */
+s390_pci_set_aif(pbdev, NULL, false, pbdev->intassist);
+} else if (pbdev->summary_ind) {
 pci_dereg_irqs(pbdev);
 }
 if (pbdev->iommu->enabled) {
@@ -1382,7 +1396,10 @@ static void s390_pci_device_reset(DeviceState *dev)
 break;
 }
 
-if (pbdev->summary_ind) {
+if (pbdev->interp) {
+/* Interpreted devices were using interrupt forwarding */
+s390_pci_set_aif(pbdev, NULL, false, pbdev->intassist);
+} else if (pbdev->summary_ind) {
 pci_dereg_irqs(pbdev);
 }
 if (pbdev->iommu->enabled) {
@@ -1428,6 +1445,7 @@ static Property s390_pci_device_properties[] = {
 DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
 DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
 DEFINE_PROP_BOOL("interp", S390PCIBusDevice, interp, true),
+DEFINE_PROP_BOOL("intassist", S390PCIBusDevice, intassist, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index e9a0dc12e4..121e07cc41 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -,6 +,46 @@ static void fmb_update(void *opaque)
 timer_mod(pbdev->fmb_timer, t + pbdev->pci_group->zpci_group.mui);
 }
 
+static int mpcifc_reg_int_interp(S390PCIBusDevice *pbdev, ZpciFib *fib)
+{
+int rc;
+
+/* Interpreted devices must also use interrupt forwarding */
+rc = s390_pci_get_aif(pbdev, false, pbdev->intassist);
+if (rc) {
+DPRINTF("Bad interrupt forwarding state\n");
+return rc;
+}
+
+rc = s390_pci_set_aif(pbdev, fib, true, pbdev->intassist);
+if (rc) {
+DPRINTF("Failed to enable interrupt forwarding\n");
+return rc;
+}
+
+return 0;
+}
+
+static int mpcifc_dereg_int_interp(S390PCIBusDevice *pbdev, ZpciFib *fib)
+{
+int rc;
+
+/* Interpreted devices were using interrupt forwarding */
+rc = s390_pci_get_aif(pbdev, true, pbdev->intassist);
+if (rc) {
+DPRINTF("Bad interrupt forwarding state\n");
+return rc;
+}
+
+rc = s390_pci_set_aif(p

[PATCH v2 5/9] s390x/pci: don't fence interpreted devices without MSI-X

Lack of MSI-X support is not an issue for interpreted passthrough
devices, so let's let these in.  This will allow, for example, ISM
devices to be passed through -- but only when interpretation is
available and being used.

Reviewed-by: Thomas Huth 
Reviewed-by: Pierre Morel 
Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index a39ccfee05..66649af6e0 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -1097,7 +1097,7 @@ static void s390_pcihost_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 pbdev->interp = false;
 }
 
-if (s390_pci_msix_init(pbdev)) {
+if (s390_pci_msix_init(pbdev) && !pbdev->interp) {
 error_setg(errp, "MSI-X support is mandatory "
"in the S390 architecture");
 return;
-- 
2.27.0

[PATCH v2 4/9] s390x/pci: enable for load/store intepretation

Use the associated vfio feature ioctl to enable interpretation for devices
when requested.  As part of this process, we must use the host function
handle rather than a QEMU-generated one -- this is provided as part of the
ioctl payload.

Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-bus.c  | 70 +++-
 hw/s390x/s390-pci-inst.c | 63 +++-
 hw/s390x/s390-pci-vfio.c | 52 
 include/hw/s390x/s390-pci-bus.h  |  1 +
 include/hw/s390x/s390-pci-vfio.h | 15 +++
 5 files changed, 199 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 01b58ebc70..a39ccfee05 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -971,12 +971,58 @@ static void s390_pci_update_subordinate(PCIDevice *dev, 
uint32_t nr)
 }
 }
 
+static int s390_pci_interp_plug(S390pciState *s, S390PCIBusDevice *pbdev)
+{
+uint32_t idx;
+int rc;
+
+rc = s390_pci_probe_interp(pbdev);
+if (rc) {
+return rc;
+}
+
+rc = s390_pci_update_passthrough_fh(pbdev);
+if (rc) {
+return rc;
+}
+
+/*
+ * The host device is already in an enabled state, but we always present
+ * the initial device state to the guest as disabled (ZPCI_FS_DISABLED).
+ * Therefore, mask off the enable bit from the passthrough handle until
+ * the guest issues a CLP SET PCI FN later to enable the device.
+ */
+pbdev->fh &= ~FH_MASK_ENABLE;
+
+/* Next, see if the idx is already in-use */
+idx = pbdev->fh & FH_MASK_INDEX;
+if (pbdev->idx != idx) {
+if (s390_pci_find_dev_by_idx(s, idx)) {
+return -EINVAL;
+}
+/*
+ * Update the idx entry with the passed through idx
+ * If the relinquished idx is lower than next_idx, use it
+ * to replace next_idx
+ */
+g_hash_table_remove(s->zpci_table, &pbdev->idx);
+if (idx < s->next_idx) {
+s->next_idx = idx;
+}
+pbdev->idx = idx;
+g_hash_table_insert(s->zpci_table, &pbdev->idx, pbdev);
+}
+
+return 0;
+}
+
 static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
   Error **errp)
 {
 S390pciState *s = S390_PCI_HOST_BRIDGE(hotplug_dev);
 PCIDevice *pdev = NULL;
 S390PCIBusDevice *pbdev = NULL;
+int rc;
 
 if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
 PCIBridge *pb = PCI_BRIDGE(dev);
@@ -1022,12 +1068,33 @@ static void s390_pcihost_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 set_pbdev_info(pbdev);
 
 if (object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
-pbdev->fh |= FH_SHM_VFIO;
+/*
+ * By default, interpretation is always requested; if the available
+ * facilities indicate it is not available, fallback to the
+ * intercept model.
+ */
+if (pbdev->interp && !s390_has_feat(S390_FEAT_ZPCI_INTERP)) {
+DPRINTF("zPCI interpretation facilities missing.\n");
+pbdev->interp = false;
+}
+if (pbdev->interp) {
+rc = s390_pci_interp_plug(s, pbdev);
+if (rc) {
+error_setg(errp, "zpci interp plug failed: %d", rc);
+return;
+}
+}
 pbdev->iommu->dma_limit = s390_pci_start_dma_count(s, pbdev);
 /* Fill in CLP information passed via the vfio region */
 s390_pci_get_clp_info(pbdev);
+if (!pbdev->interp) {
+/* Do vfio passthrough but intercept for I/O */
+pbdev->fh |= FH_SHM_VFIO;
+}
 } else {
 pbdev->fh |= FH_SHM_EMUL;
+/* Always intercept emulated devices */
+pbdev->interp = false;
 }
 
 if (s390_pci_msix_init(pbdev)) {
@@ -1360,6 +1427,7 @@ static Property s390_pci_device_properties[] = {
 DEFINE_PROP_UINT16("uid", S390PCIBusDevice, uid, UID_UNDEFINED),
 DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
 DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
+DEFINE_PROP_BOOL("interp", S390PCIBusDevice, interp, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 6d400d4147..e9a0dc12e4 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -18,6 +18,7 @@
 #include "sysemu/hw_accel.h"
 #include "hw/s390x/s390-pci-inst.h"
 #include "hw/s390x/s390-pci-bus.h"
+#include "hw/s390x/s390-pci-vfio.h"
 #include "hw/s390x/tod.h"
 
 #ifndef DEBUG_S390PCI_INST
@@ -156,6 +157,47 @@ out:
 return rc;
 }
 
+static int clp_enable_interp(S390PCIBusDevice *pbdev)
+{
+int rc;
+
+rc = s390_pci_set_interp(pbdev, true);
+if (rc) {
+DPRINTF("Failed to enable interpretation\n");
+

[PATCH v2 0/9] s390x/pci: zPCI interpretation support

For QEMU, the majority of the work in enabling instruction interpretation
is handled via new VFIO ioctls to SET the appropriate interpretation and
interrupt forwarding modes, and to GET the function handle to use for
interpretive execution.  

This series implements these new ioctls, as well as adding a new, optional
'intercept' parameter to zpci to request interpretation support not be used
as well as an 'intassist' parameter to determine whether or not the
firmware assist will be used for interrupt delivery or whether the host
will be responsible for delivering all interrupts.

The ZPCI_INTERP CPU feature is added beginning with the z14 model to
enable this support.

As a consequence of implementing zPCI interpretation, ISM devices now
become eligible for passthrough (but only when zPCI interpretation is
available).

>From the perspective of guest configuration, you passthrough zPCI devices
in the same manner as before, with intepretation support being used by
default if available in kernel+qemu.

Associated kernel series:
https://lore.kernel.org/kvm/20220114203145.242984-1-mjros...@linux.ibm.com/

Changes v1->v2:

- Update kernel headers sync
- Drop some pre-req patches that are now merged 
- Add some R-bs (Thanks!)   
- fence FEAT_ZPCI_INTERP for QEMU 6.2 and older (Christian) 
- switch from container_of to VFIO_PCI and drop asserts (Thomas)
- re-arrange g_autofree so we malloc at time of declaration (Thomas) 

Matthew Rosato (9):
  Update linux headers
  target/s390x: add zpci-interp to cpu models
  fixup: force interp off for QEMU machine 6.2 and older
  s390x/pci: enable for load/store intepretation
  s390x/pci: don't fence interpreted devices without MSI-X
  s390x/pci: enable adapter event notification for interpreted devices
  s390x/pci: use I/O Address Translation assist when interpreting
  s390x/pci: use dtsm provided from vfio capabilities for interpreted
devices
  s390x/pci: let intercept devices have separate PCI groups

 hw/s390x/s390-pci-bus.c   | 121 ++-
 hw/s390x/s390-pci-inst.c  | 168 ++-
 hw/s390x/s390-pci-vfio.c  | 204 +-
 hw/s390x/s390-virtio-ccw.c|   1 +
 include/hw/s390x/s390-pci-bus.h   |   8 +-
 include/hw/s390x/s390-pci-inst.h  |   2 +-
 include/hw/s390x/s390-pci-vfio.h  |  45 
 include/standard-headers/asm-x86/kvm_para.h   |   1 +
 include/standard-headers/drm/drm_fourcc.h |  11 +
 include/standard-headers/linux/ethtool.h  |   1 +
 include/standard-headers/linux/fuse.h |  60 +-
 include/standard-headers/linux/pci_regs.h |   4 +
 include/standard-headers/linux/virtio_iommu.h |   8 +-
 linux-headers/asm-mips/unistd_n32.h   |   1 +
 linux-headers/asm-mips/unistd_n64.h   |   1 +
 linux-headers/asm-mips/unistd_o32.h   |   1 +
 linux-headers/asm-powerpc/unistd_32.h |   1 +
 linux-headers/asm-powerpc/unistd_64.h |   1 +
 linux-headers/asm-s390/kvm.h  |   1 +
 linux-headers/asm-s390/unistd_32.h|   1 +
 linux-headers/asm-s390/unistd_64.h|   1 +
 linux-headers/linux/kvm.h |   1 +
 linux-headers/linux/vfio.h|  22 ++
 linux-headers/linux/vfio_zdev.h   |  51 +
 target/s390x/cpu_features_def.h.inc   |   1 +
 target/s390x/gen-features.c   |   2 +
 target/s390x/kvm/kvm.c|   1 +
 27 files changed, 693 insertions(+), 27 deletions(-)

-- 
2.27.0

[PATCH v2 1/9] Update linux headers

This is a placeholder that pulls in 5.17 + unmerged kernel changes
required by this item.  A proper header sync can be done once the
associated kernel code merges.

Signed-off-by: Matthew Rosato 
---
 include/standard-headers/asm-x86/kvm_para.h   |  1 +
 include/standard-headers/drm/drm_fourcc.h | 11 
 include/standard-headers/linux/ethtool.h  |  1 +
 include/standard-headers/linux/fuse.h | 60 +--
 include/standard-headers/linux/pci_regs.h |  4 ++
 include/standard-headers/linux/virtio_iommu.h |  8 ++-
 linux-headers/asm-mips/unistd_n32.h   |  1 +
 linux-headers/asm-mips/unistd_n64.h   |  1 +
 linux-headers/asm-mips/unistd_o32.h   |  1 +
 linux-headers/asm-powerpc/unistd_32.h |  1 +
 linux-headers/asm-powerpc/unistd_64.h |  1 +
 linux-headers/asm-s390/kvm.h  |  1 +
 linux-headers/asm-s390/unistd_32.h|  1 +
 linux-headers/asm-s390/unistd_64.h|  1 +
 linux-headers/linux/kvm.h |  1 +
 linux-headers/linux/vfio.h| 22 +++
 linux-headers/linux/vfio_zdev.h   | 51 
 17 files changed, 162 insertions(+), 5 deletions(-)

diff --git a/include/standard-headers/asm-x86/kvm_para.h 
b/include/standard-headers/asm-x86/kvm_para.h
index 204cfb8640..f0235e58a1 100644
--- a/include/standard-headers/asm-x86/kvm_para.h
+++ b/include/standard-headers/asm-x86/kvm_para.h
@@ -8,6 +8,7 @@
  * should be used to determine that a VM is running under KVM.
  */
 #define KVM_CPUID_SIGNATURE0x4000
+#define KVM_SIGNATURE "KVMKVMKVM\0\0\0"
 
 /* This CPUID returns two feature bitmaps in eax, edx. Before enabling
  * a particular paravirtualization, the appropriate feature bit should
diff --git a/include/standard-headers/drm/drm_fourcc.h 
b/include/standard-headers/drm/drm_fourcc.h
index 2c025cb4fe..4888f85f69 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -313,6 +313,13 @@ extern "C" {
  */
 #define DRM_FORMAT_P016fourcc_code('P', '0', '1', '6') /* 2x2 
subsampled Cr:Cb plane 16 bits per channel */
 
+/* 2 plane YCbCr420.
+ * 3 10 bit components and 2 padding bits packed into 4 bytes.
+ * index 0 = Y plane, [31:0] x:Y2:Y1:Y0 2:10:10:10 little endian
+ * index 1 = Cr:Cb plane, [63:0] x:Cr2:Cb2:Cr1:x:Cb1:Cr0:Cb0 
[2:10:10:10:2:10:10:10] little endian
+ */
+#define DRM_FORMAT_P030fourcc_code('P', '0', '3', '0') /* 2x2 
subsampled Cr:Cb plane 10 bits per channel packed */
+
 /* 3 plane non-subsampled (444) YCbCr
  * 16 bits per component, but only 10 bits are used and 6 bits are padded
  * index 0: Y plane, [15:0] Y:x [10:6] little endian
@@ -853,6 +860,10 @@ drm_fourcc_canonicalize_nvidia_format_mod(uint64_t 
modifier)
  * and UV.  Some SAND-using hardware stores UV in a separate tiled
  * image from Y to reduce the column height, which is not supported
  * with these modifiers.
+ *
+ * The DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT modifier is also
+ * supported for DRM_FORMAT_P030 where the columns remain as 128 bytes
+ * wide, but as this is a 10 bpp format that translates to 96 pixels.
  */
 
 #define DRM_FORMAT_MOD_BROADCOM_SAND32_COL_HEIGHT(v) \
diff --git a/include/standard-headers/linux/ethtool.h 
b/include/standard-headers/linux/ethtool.h
index 688eb8dc39..38d5a4cd6e 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -231,6 +231,7 @@ enum tunable_id {
ETHTOOL_RX_COPYBREAK,
ETHTOOL_TX_COPYBREAK,
ETHTOOL_PFC_PREVENTION_TOUT, /* timeout in msecs */
+   ETHTOOL_TX_COPYBREAK_BUF_SIZE,
/*
 * Add your fresh new tunable attribute above and remember to update
 * tunable_strings[] in net/ethtool/common.c
diff --git a/include/standard-headers/linux/fuse.h 
b/include/standard-headers/linux/fuse.h
index 23ea31708b..bda06258be 100644
--- a/include/standard-headers/linux/fuse.h
+++ b/include/standard-headers/linux/fuse.h
@@ -184,6 +184,16 @@
  *
  *  7.34
  *  - add FUSE_SYNCFS
+ *
+ *  7.35
+ *  - add FOPEN_NOFLUSH
+ *
+ *  7.36
+ *  - extend fuse_init_in with reserved fields, add FUSE_INIT_EXT init flag
+ *  - add flags2 to fuse_init_in and fuse_init_out
+ *  - add FUSE_SECURITY_CTX init flag
+ *  - add security context to create, mkdir, symlink, and mknod requests
+ *  - add FUSE_HAS_INODE_DAX, FUSE_ATTR_DAX
  */
 
 #ifndef _LINUX_FUSE_H
@@ -215,7 +225,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 34
+#define FUSE_KERNEL_MINOR_VERSION 36
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -286,12 +296,14 @@ struct fuse_file_lock {
  * FOPEN_NONSEEKABLE: the file is not seekable
  * FOPEN_CACHE_DIR: allow caching this directory
  * FOPEN_STREAM: the file is stream-like (no file position at all)
+ * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)

[PATCH v2 8/9] s390x/pci: use dtsm provided from vfio capabilities for interpreted devices

When using the IOAT assist via interpretation, we should advertise what
the host driver supports, not QEMU.

Reviewed-by: Pierre Morel 
Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-vfio.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 4e3165bff5..347cbdfdf8 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -318,7 +318,11 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
 resgrp->i = cap->noi;
 resgrp->maxstbl = cap->maxstbl;
 resgrp->version = cap->version;
-resgrp->dtsm = ZPCI_DTSM;
+if (hdr->version >= 2 && pbdev->interp) {
+resgrp->dtsm = cap->dtsm;
+} else {
+resgrp->dtsm = ZPCI_DTSM;
+}
 }
 }
 
-- 
2.27.0

[PATCH v2 9/9] s390x/pci: let intercept devices have separate PCI groups

Let's use the reserved pool of simulated PCI groups to allow intercept
devices to have separate groups from interpreted devices as some group
values may be different. If we run out of simulated PCI groups, subsequent
intercept devices just get the default group.
Furthermore, if we encounter any PCI groups from hostdevs that are marked
as simulated, let's just assign them to the default group to avoid
conflicts between host simulated groups and our own simulated groups.

Reviewed-by: Pierre Morel 
Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-bus.c | 19 ++--
 hw/s390x/s390-pci-vfio.c| 40 ++---
 include/hw/s390x/s390-pci-bus.h |  6 -
 3 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 49ae2fd0ea..705a19ddb9 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -747,13 +747,14 @@ static void s390_pci_iommu_free(S390pciState *s, PCIBus 
*bus, int32_t devfn)
 object_unref(OBJECT(iommu));
 }
 
-S390PCIGroup *s390_group_create(int id)
+S390PCIGroup *s390_group_create(int id, int host_id)
 {
 S390PCIGroup *group;
 S390pciState *s = s390_get_phb();
 
 group = g_new0(S390PCIGroup, 1);
 group->id = id;
+group->host_id = host_id;
 QTAILQ_INSERT_TAIL(&s->zpci_groups, group, link);
 return group;
 }
@@ -771,12 +772,25 @@ S390PCIGroup *s390_group_find(int id)
 return NULL;
 }
 
+S390PCIGroup *s390_group_find_host_sim(int host_id)
+{
+S390PCIGroup *group;
+S390pciState *s = s390_get_phb();
+
+QTAILQ_FOREACH(group, &s->zpci_groups, link) {
+if (group->id >= ZPCI_SIM_GRP_START && group->host_id == host_id) {
+return group;
+}
+}
+return NULL;
+}
+
 static void s390_pci_init_default_group(void)
 {
 S390PCIGroup *group;
 ClpRspQueryPciGrp *resgrp;
 
-group = s390_group_create(ZPCI_DEFAULT_FN_GRP);
+group = s390_group_create(ZPCI_DEFAULT_FN_GRP, ZPCI_DEFAULT_FN_GRP);
 resgrp = &group->zpci_group;
 resgrp->fr = 1;
 resgrp->dasm = 0;
@@ -824,6 +838,7 @@ static void s390_pcihost_realize(DeviceState *dev, Error 
**errp)
NULL, g_free);
 s->zpci_table = g_hash_table_new_full(g_int_hash, g_int_equal, NULL, NULL);
 s->bus_no = 0;
+s->next_sim_grp = ZPCI_SIM_GRP_START;
 QTAILQ_INIT(&s->pending_sei);
 QTAILQ_INIT(&s->zpci_devs);
 QTAILQ_INIT(&s->zpci_dma_limit);
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 347cbdfdf8..e7e6eca60c 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -287,13 +287,17 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
 {
 struct vfio_info_cap_header *hdr;
 struct vfio_device_info_cap_zpci_group *cap;
+S390pciState *s = s390_get_phb();
 ClpRspQueryPciGrp *resgrp;
 VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, pdev);
 
 hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
 
-/* If capability not provided, just use the default group */
-if (hdr == NULL) {
+/*
+ * If capability not provided or the underlying hostdev is simulated, just
+ * use the default group.
+ */
+if (hdr == NULL || pbdev->zpci_fn.pfgid >= ZPCI_SIM_GRP_START) {
 trace_s390_pci_clp_cap(vpci->vbasedev.name,
VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
 pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
@@ -302,11 +306,41 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
 }
 cap = (void *) hdr;
 
+/*
+ * For an intercept device, let's use an existing simulated group if one
+ * one was already created for other intercept devices in this group.
+ * If not, create a new simulated group if any are still available.
+ * If all else fails, just fall back on the default group.
+ */
+if (!pbdev->interp) {
+pbdev->pci_group = s390_group_find_host_sim(pbdev->zpci_fn.pfgid);
+if (pbdev->pci_group) {
+/* Use existing simulated group */
+pbdev->zpci_fn.pfgid = pbdev->pci_group->id;
+return;
+} else {
+if (s->next_sim_grp == ZPCI_DEFAULT_FN_GRP) {
+/* All out of simulated groups, use default */
+trace_s390_pci_clp_cap(vpci->vbasedev.name,
+   VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
+pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
+pbdev->pci_group = s390_group_find(ZPCI_DEFAULT_FN_GRP);
+return;
+} else {
+/* We can assign a new simulated group */
+pbdev->zpci_fn.pfgid = s->next_sim_grp;
+s->next_sim_grp++;
+/* Fall through to create the new sim group using CLP info */
+}
+}
+}
+
 /* See if the PCI group is already defined, create if not */

[PATCH v2 7/9] s390x/pci: use I/O Address Translation assist when interpreting

Allow the underlying kvm host to handle the Refresh PCI Translation
instruction intercepts.

Reviewed-by: Pierre Morel 
Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-bus.c  |  6 ++--
 hw/s390x/s390-pci-inst.c | 51 ++--
 hw/s390x/s390-pci-vfio.c | 27 +
 include/hw/s390x/s390-pci-inst.h |  2 +-
 include/hw/s390x/s390-pci-vfio.h | 10 +++
 5 files changed, 89 insertions(+), 7 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 6ee70446ca..49ae2fd0ea 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -196,7 +196,7 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
 pci_dereg_irqs(pbdev);
 }
 if (pbdev->iommu->enabled) {
-pci_dereg_ioat(pbdev->iommu);
+pci_dereg_ioat(pbdev);
 }
 pbdev->state = ZPCI_FS_STANDBY;
 rc = SCLP_RC_NORMAL_COMPLETION;
@@ -1262,7 +1262,7 @@ static void s390_pcihost_reset(DeviceState *dev)
 pci_dereg_irqs(pbdev);
 }
 if (pbdev->iommu->enabled) {
-pci_dereg_ioat(pbdev->iommu);
+pci_dereg_ioat(pbdev);
 }
 pbdev->state = ZPCI_FS_STANDBY;
 s390_pci_perform_unplug(pbdev);
@@ -1403,7 +1403,7 @@ static void s390_pci_device_reset(DeviceState *dev)
 pci_dereg_irqs(pbdev);
 }
 if (pbdev->iommu->enabled) {
-pci_dereg_ioat(pbdev->iommu);
+pci_dereg_ioat(pbdev);
 }
 
 fmb_timer_free(pbdev);
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 121e07cc41..0fa18866c0 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -978,6 +978,24 @@ int pci_dereg_irqs(S390PCIBusDevice *pbdev)
 return 0;
 }
 
+static int reg_ioat_interp(S390PCIBusDevice *pbdev, uint64_t iota)
+{
+int rc;
+
+rc = s390_pci_probe_ioat(pbdev);
+if (rc) {
+return rc;
+}
+
+rc = s390_pci_set_ioat(pbdev, iota);
+if (rc) {
+return rc;
+}
+
+pbdev->iommu->enabled = true;
+return 0;
+}
+
 static int reg_ioat(CPUS390XState *env, S390PCIBusDevice *pbdev, ZpciFib fib,
 uintptr_t ra)
 {
@@ -995,6 +1013,16 @@ static int reg_ioat(CPUS390XState *env, S390PCIBusDevice 
*pbdev, ZpciFib fib,
 return -EINVAL;
 }
 
+/* If this is an interpreted device, we must use the IOAT assist */
+if (pbdev->interp) {
+if (reg_ioat_interp(pbdev, g_iota)) {
+error_report("failure starting ioat assist");
+s390_program_interrupt(env, PGM_OPERAND, ra);
+return -EINVAL;
+}
+return 0;
+}
+
 /* currently we only support designation type 1 with translation */
 if (!(dt == ZPCI_IOTA_RTTO && t)) {
 error_report("unsupported ioat dt %d t %d", dt, t);
@@ -1011,8 +1039,25 @@ static int reg_ioat(CPUS390XState *env, S390PCIBusDevice 
*pbdev, ZpciFib fib,
 return 0;
 }
 
-void pci_dereg_ioat(S390PCIIOMMU *iommu)
+static void dereg_ioat_interp(S390PCIBusDevice *pbdev)
 {
+if (s390_pci_probe_ioat(pbdev) != 0) {
+return;
+}
+
+s390_pci_set_ioat(pbdev, 0);
+pbdev->iommu->enabled = false;
+}
+
+void pci_dereg_ioat(S390PCIBusDevice *pbdev)
+{
+S390PCIIOMMU *iommu = pbdev->iommu;
+
+if (pbdev->interp) {
+dereg_ioat_interp(pbdev);
+return;
+}
+
 s390_pci_iommu_disable(iommu);
 iommu->pba = 0;
 iommu->pal = 0;
@@ -1251,7 +1296,7 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, 
uint64_t fiba, uint8_t ar,
 cc = ZPCI_PCI_LS_ERR;
 s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
 } else {
-pci_dereg_ioat(pbdev->iommu);
+pci_dereg_ioat(pbdev);
 }
 break;
 case ZPCI_MOD_FC_REREG_IOAT:
@@ -1262,7 +1307,7 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, 
uint64_t fiba, uint8_t ar,
 cc = ZPCI_PCI_LS_ERR;
 s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
 } else {
-pci_dereg_ioat(pbdev->iommu);
+pci_dereg_ioat(pbdev);
 if (reg_ioat(env, pbdev, fib, ra)) {
 cc = ZPCI_PCI_LS_ERR;
 s390_set_status_code(env, r1, ZPCI_MOD_ST_INSUF_RES);
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 73f3b3ed19..4e3165bff5 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -228,6 +228,33 @@ int s390_pci_get_aif(S390PCIBusDevice *pbdev, bool enable, 
bool assist)
 return rc;
 }
 
+int s390_pci_probe_ioat(S390PCIBusDevice *pbdev)
+{
+VFIOPCIDevice *vdev = VFIO_PCI(pbdev->pdev);
+struct vfio_device_feature feat = {
+.argsz = sizeof(struct vfio_device_feature),
+.flags = VFIO_DEVICE_FEATURE_PROBE + VFIO_DEVICE_FEATURE_ZPCI_IOAT
+};
+
+return ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, &feat);
+}
+
+int s390_pci_set_ioat(S390PCIBusDevice *pbde

[PATCH v2 2/9] target/s390x: add zpci-interp to cpu models

The zpci-interp feature is used to specify whether zPCI interpretation is
to be used for this guest.

Signed-off-by: Matthew Rosato 
---
 target/s390x/cpu_features_def.h.inc | 1 +
 target/s390x/gen-features.c | 2 ++
 target/s390x/kvm/kvm.c  | 1 +
 3 files changed, 4 insertions(+)

diff --git a/target/s390x/cpu_features_def.h.inc 
b/target/s390x/cpu_features_def.h.inc
index e86662bb3b..4ade3182aa 100644
--- a/target/s390x/cpu_features_def.h.inc
+++ b/target/s390x/cpu_features_def.h.inc
@@ -146,6 +146,7 @@ DEF_FEAT(SIE_CEI, "cei", SCLP_CPU, 43, "SIE: 
Conditional-external-interception f
 DEF_FEAT(DAT_ENH_2, "dateh2", MISC, 0, "DAT-enhancement facility 2")
 DEF_FEAT(CMM, "cmm", MISC, 0, "Collaborative-memory-management facility")
 DEF_FEAT(AP, "ap", MISC, 0, "AP instructions installed")
+DEF_FEAT(ZPCI_INTERP, "zpci-interp", MISC, 0, "zPCI interpretation")
 
 /* Features exposed via the PLO instruction. */
 DEF_FEAT(PLO_CL, "plo-cl", PLO, 0, "PLO Compare and load (32 bit in general 
registers)")
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 7cb1a6ec10..7005d22415 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -554,6 +554,7 @@ static uint16_t full_GEN14_GA1[] = {
 S390_FEAT_HPMA2,
 S390_FEAT_SIE_KSS,
 S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
+S390_FEAT_ZPCI_INTERP,
 };
 
 #define full_GEN14_GA2 EmptyFeat
@@ -650,6 +651,7 @@ static uint16_t default_GEN14_GA1[] = {
 S390_FEAT_GROUP_MSA_EXT_8,
 S390_FEAT_MULTIPLE_EPOCH,
 S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
+S390_FEAT_ZPCI_INTERP,
 };
 
 #define default_GEN14_GA2 EmptyFeat
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 6acf14d5ec..0357bfda89 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2294,6 +2294,7 @@ static int kvm_to_feat[][2] = {
 { KVM_S390_VM_CPU_FEAT_PFMFI, S390_FEAT_SIE_PFMFI},
 { KVM_S390_VM_CPU_FEAT_SIGPIF, S390_FEAT_SIE_SIGPIF},
 { KVM_S390_VM_CPU_FEAT_KSS, S390_FEAT_SIE_KSS},
+{ KVM_S390_VM_CPU_FEAT_ZPCI_INTERP, S390_FEAT_ZPCI_INTERP },
 };
 
 static int query_cpu_feat(S390FeatBitmap features)
-- 
2.27.0

[PATCH v2 3/9] fixup: force interp off for QEMU machine 6.2 and older

Double-check I'm doing this right + test.

Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-virtio-ccw.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 84e3e63c43..e02fe11b07 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -803,6 +803,7 @@ DEFINE_CCW_MACHINE(7_0, "7.0", true);
 static void ccw_machine_6_2_instance_options(MachineState *machine)
 {
 ccw_machine_7_0_instance_options(machine);
+s390_cpudef_featoff_greater(14, 1, S390_FEAT_ZPCI_INTERP);
 }
 
 static void ccw_machine_6_2_class_options(MachineClass *mc)
-- 
2.27.0

Re: [PATCH v3 23/31] iotests: switch to AQMP

2022-01-14 Thread Eric Blake

On Mon, Jan 10, 2022 at 06:29:02PM -0500, John Snow wrote:
> Simply import the type defition from the new location.

definition

> 
> Signed-off-by: John Snow 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Beraldo Leal 
> ---
>  tests/qemu-iotests/iotests.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 10/30] bsd-user/signal.c: Implement signal_init()

On Thu, Jan 13, 2022 at 12:28 PM Peter Maydell 
wrote:

> On Sun, 9 Jan 2022 at 16:29, Warner Losh  wrote:
> >
> > Initialize the signal state for the emulator. Setup a set of sane
> > default signal handlers, mirroring the host's signals. For fatal signals
> > (those that exit by default), establish our own set of signal
> > handlers. Stub out the actual signal handler we use for the moment.
> >
> > Signed-off-by: Stacey Son 
> > Signed-off-by: Kyle Evans 
> > Signed-off-by: Warner Losh 
> > ---
> >  bsd-user/qemu.h   |  1 +
> >  bsd-user/signal.c | 68 +++
> >  2 files changed, 69 insertions(+)
>
> > +static struct target_sigaction sigact_table[TARGET_NSIG];
>
>
>
>
> >  void signal_init(void)
> >  {
> > +TaskState *ts = (TaskState *)thread_cpu->opaque;
> > +struct sigaction act;
> > +struct sigaction oact;
> > +int i;
> > +int host_sig;
> > +
> > +/* Set the signal mask from the host mask. */
> > +sigprocmask(0, 0, &ts->signal_mask);
> > +
> > +/*
> > + * Set all host signal handlers. ALL signals are blocked during the
> > + * handlers to serialize them.
> > + */
> > +memset(sigact_table, 0, sizeof(sigact_table));
>
> Do you need this memset()? sigact_table is a global, so it's
> zero-initialized on startup, and this function is only called once.
> The (otherwise basically identical) Linux version of this function
> doesn't have it.
>

Yea, that looks bogus. I'll remove it.


> > +
> > +sigfillset(&act.sa_mask);
> > +act.sa_sigaction = host_signal_handler;
> > +act.sa_flags = SA_SIGINFO;
> > +
> > +for (i = 1; i <= TARGET_NSIG; i++) {
> > +host_sig = target_to_host_signal(i);
> > +sigaction(host_sig, NULL, &oact);
> > +if (oact.sa_sigaction == (void *)SIG_IGN) {
> > +sigact_table[i - 1]._sa_handler = TARGET_SIG_IGN;
> > +} else if (oact.sa_sigaction == (void *)SIG_DFL) {
> > +sigact_table[i - 1]._sa_handler = TARGET_SIG_DFL;
> > +}
> > +/*
> > + * If there's already a handler installed then something has
> > + * gone horribly wrong, so don't even try to handle that case.
> > + * Install some handlers for our own use.  We need at least
> > + * SIGSEGV and SIGBUS, to detect exceptions.  We can not just
> > + * trap all signals because it affects syscall interrupt
> > + * behavior.  But do trap all default-fatal signals.
> > + */
> > +if (fatal_signal(i)) {
> > +sigaction(host_sig, &act, NULL);
> > +}
> > +}
> >  }
>
> Otherwise
>
> Reviewed-by: Peter Maydell 
>

thanks!

Warner

Re: [PATCH 09/30] bsd-user/signal.c: implement abstract target / host signal translation

On Thu, Jan 13, 2022 at 10:45 AM Peter Maydell 
wrote:

> On Sun, 9 Jan 2022 at 16:29, Warner Losh  wrote:
> >
> > Implement host_to_target_signal and target_to_host_signal.
> >
> > Signed-off-by: Stacey Son 
> > Signed-off-by: Kyle Evans 
> > Signed-off-by: Warner Losh 
> > ---
> >  bsd-user/qemu.h   |  2 ++
> >  bsd-user/signal.c | 11 +++
> >  2 files changed, 13 insertions(+)
> >
> > diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
> > index 1b3b974afe9..334f8b1d715 100644
> > --- a/bsd-user/qemu.h
> > +++ b/bsd-user/qemu.h
> > @@ -210,6 +210,8 @@ long do_sigreturn(CPUArchState *env);
> >  long do_rt_sigreturn(CPUArchState *env);
> >  void queue_signal(CPUArchState *env, int sig, target_siginfo_t *info);
> >  abi_long do_sigaltstack(abi_ulong uss_addr, abi_ulong uoss_addr,
> abi_ulong sp);
> > +int target_to_host_signal(int sig);
> > +int host_to_target_signal(int sig);
> >
> >  /* mmap.c */
> >  int target_mprotect(abi_ulong start, abi_ulong len, int prot);
> > diff --git a/bsd-user/signal.c b/bsd-user/signal.c
> > index 844dfa19095..7ea86149981 100644
> > --- a/bsd-user/signal.c
> > +++ b/bsd-user/signal.c
> > @@ -2,6 +2,7 @@
> >   *  Emulation of BSD signals
> >   *
> >   *  Copyright (c) 2003 - 2008 Fabrice Bellard
> > + *  Copyright (c) 2013 Stacey Son
> >   *
> >   *  This program is free software; you can redistribute it and/or modify
> >   *  it under the terms of the GNU General Public License as published by
> > @@ -27,6 +28,16 @@
> >   * fork.
> >   */
> >
> > +int host_to_target_signal(int sig)
> > +{
> > +return sig;
> > +}
> > +
> > +int target_to_host_signal(int sig)
> > +{
> > +return sig;
> > +}
> > +
>
> This could use a comment:
>
> /*
>  * For the BSDs signal numbers are always the same regardless of
>  * CPU architecture, so (unlike Linux) these functions are just
>  * the identity mapping.
>  */
>
> (assuming that is correct, of course!)
>

It's true enough. Even though there's code to run FooBSD on BarBSD,
that code doesn't work (at all really) today. It would take some doing to
get that working, so I've added a comment that the encoding might not
be the same in that case, but otherwise is the same.

This is issue is one I'm deferring doing anything on until I can get things
upstreamed... It's a bit of a mess if you aren't on FreeBSD, but neither the
NetBSD nor OpenBSD communities are using bsd-user because it's so
broken and incomplete in implementing their ABIs.


> Otherwise
> Reviewed-by: Peter Maydell 
>

Thanks!

Warner

Re: [PATCH 08/30] bsd-user/arm/target_arch_cpu.h: Implement data faults

On Fri, 14 Jan 2022 at 18:14, Warner Losh  wrote:
>
>
>
> On Thu, Jan 13, 2022 at 10:40 AM Peter Maydell  
> wrote:
>>
>> On Sun, 9 Jan 2022 at 16:29, Warner Losh  wrote:
>> >
>> > Update for the richer set of data faults that are now possible. Copied
>> > largely from linux-user/arm/cpu_loop.c
>> >
>> > Signed-off-by: Warner Losh 

>> "Permission" (I see we have this typo in linux-user).
>
>
> Fixed. Also, if you can, please cc me if you'd like on 'back ported' fixes 
> into linux-user when you post them
> for review that arise from this. It helps me keep track and not miss them in 
> this rather high volume mailing
> list.

Sure, I can do that. Already posted this afternoon:
https://patchew.org/QEMU/20220114153732.3767229-1-peter.mayd...@linaro.org/
https://patchew.org/QEMU/20220114155032.3767771-1-peter.mayd...@linaro.org/

and I forgot about the 'permision' typo or I'd have folded it into
that 'nits' series, so I'll post that in a moment...

-- PMM

[PATCH] linux-user: Fix comment typo in arm cpu_loop code

Fix a typo in a comment in the arm cpu_loop code.

Signed-off-by: Peter Maydell 
---
 linux-user/arm/cpu_loop.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/arm/cpu_loop.c b/linux-user/arm/cpu_loop.c
index f153ab503a8..032e1ffddfb 100644
--- a/linux-user/arm/cpu_loop.c
+++ b/linux-user/arm/cpu_loop.c
@@ -434,8 +434,8 @@ void cpu_loop(CPUARMState *env)
 case 0x6: /* Access flag fault, level 2 */
 case 0x9: /* Domain fault, level 1 */
 case 0xb: /* Domain fault, level 2 */
-case 0xd: /* Permision fault, level 1 */
-case 0xf: /* Permision fault, level 2 */
+case 0xd: /* Permission fault, level 1 */
+case 0xf: /* Permission fault, level 2 */
 si_signo = TARGET_SIGSEGV;
 si_code = TARGET_SEGV_ACCERR;
 break;
-- 
2.25.1

Re: [Virtio-fs] [PATCH v2] virtiofsd: Do not support blocking flock

2022-01-14 Thread Vivek Goyal

On Thu, Jan 13, 2022 at 04:32:49PM +0100, Sebastian Hasler wrote:
> With the current implementation, blocking flock can lead to
> deadlock. Thus, it's better to return EOPNOTSUPP if a user attempts
> to perform a blocking flock request.
> 
> Signed-off-by: Sebastian Hasler 

Reviewed-by: Vivek Goyal 

Thanks Sebastian. Good fix. I can easily reproduce the deadlock.

shell1> flock foo.txt -c "sleep 10"
shell2> flock foo.txt -c echo

First commands take flock on foo.txt. Second command blocks on lock. And
only virtiofsd thread serving the virt messages blocks on flock(). Now
first command never exits. I think it will try to free lock once sleep
is over and that will deadlock. virtiofsd thread is blocked and it will
never wake up because lock release operation will never make progress.

This will be little painful for people as they will start seeing
errors. But I guess erroring out early is better than a potential
deadlock later.

Vivek

> ---
>  tools/virtiofsd/passthrough_ll.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c 
> b/tools/virtiofsd/passthrough_ll.c
> index 64b5b4fbb1..faa62278c5 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -2442,6 +2442,15 @@ static void lo_flock(fuse_req_t req, fuse_ino_t ino, 
> struct fuse_file_info *fi,
>  int res;
>  (void)ino;
>  
> +if (!(op & LOCK_NB)) {
> +/*
> + * Blocking flock can deadlock as there is only one thread
> + * serving the queue.
> + */
> +fuse_reply_err(req, EOPNOTSUPP);
> +return;
> +}
> +
>  res = flock(lo_fi_fd(req, fi), op);
>  
>  fuse_reply_err(req, res == -1 ? errno : 0);
> -- 
> 2.33.1
> 
> ___
> Virtio-fs mailing list
> virtio...@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
>

Re: [PATCH 08/30] bsd-user/arm/target_arch_cpu.h: Implement data faults

On Thu, Jan 13, 2022 at 10:40 AM Peter Maydell 
wrote:

> On Sun, 9 Jan 2022 at 16:29, Warner Losh  wrote:
> >
> > Update for the richer set of data faults that are now possible. Copied
> > largely from linux-user/arm/cpu_loop.c
> >
> > Signed-off-by: Warner Losh 
> > ---
> >  bsd-user/arm/target_arch_cpu.h | 44 ++
> >  1 file changed, 34 insertions(+), 10 deletions(-)
> >
> > diff --git a/bsd-user/arm/target_arch_cpu.h
> b/bsd-user/arm/target_arch_cpu.h
> > index 996a361e3fe..51e592bcfe7 100644
> > --- a/bsd-user/arm/target_arch_cpu.h
> > +++ b/bsd-user/arm/target_arch_cpu.h
> > @@ -39,8 +39,7 @@ static inline void target_cpu_init(CPUARMState *env,
> >
> >  static inline void target_cpu_loop(CPUARMState *env)
> >  {
> > -int trapnr;
> > -target_siginfo_t info;
> > +int trapnr, si_signo, si_code;
> >  unsigned int n;
> >  CPUState *cs = env_cpu(env);
> >
> > @@ -143,15 +142,40 @@ static inline void target_cpu_loop(CPUARMState
> *env)
> >  /* just indicate that signals should be handled asap */
> >  break;
> >  case EXCP_PREFETCH_ABORT:
> > -/* See arm/arm/trap.c prefetch_abort_handler() */
> >  case EXCP_DATA_ABORT:
> > -/* See arm/arm/trap.c data_abort_handler() */
> > -info.si_signo = TARGET_SIGSEGV;
> > -info.si_errno = 0;
> > -/* XXX: check env->error_code */
> > -info.si_code = 0;
> > -info.si_addr = env->exception.vaddress;
> > -queue_signal(env, info.si_signo, &info);
> > +/*
> > + * See arm/arm/trap-v6.c prefetch_abort_handler() and
> data_abort_handler()
> > + *
> > + * However, FreeBSD maps these to a generic value and then
> uses that
> > + * to maybe fault in pages in
> vm/vm_fault.c:vm_fault_trap(). I
> > + * believe that the indirection maps the same as Linux, but
> haven't
> > + * chased down every single possible indirection.
> > + */
> > +
> > +/* For user-only we don't set TTBCR_EAE, so look at the
> FSR. */
> > +switch (env->exception.fsr & 0x1f) {
> > +case 0x1: /* Alignment */
> > +si_signo = TARGET_SIGBUS;
> > +si_code = TARGET_BUS_ADRALN;
> > +break;
> > +case 0x3: /* Access flag fault, level 1 */
> > +case 0x6: /* Access flag fault, level 2 */
> > +case 0x9: /* Domain fault, level 1 */
> > +case 0xb: /* Domain fault, level 2 */
> > +case 0xd: /* Permision fault, level 1 */
> > +case 0xf: /* Permision fault, level 2 */
>
> "Permission" (I see we have this typo in linux-user).
>

Fixed. Also, if you can, please cc me if you'd like on 'back ported' fixes
into linux-user when you post them
for review that arise from this. It helps me keep track and not miss them
in this rather high volume mailing
list.


> > +si_signo = TARGET_SIGSEGV;
> > +si_code = TARGET_SEGV_ACCERR;
> > +break;
> > +case 0x5: /* Translation fault, level 1 */
> > +case 0x7: /* Translation fault, level 2 */
> > +si_signo = TARGET_SIGSEGV;
> > +si_code = TARGET_SEGV_MAPERR;
> > +break;
> > +default:
> > +g_assert_not_reached();
> > +}
>
> Otherwise
> Reviewed-by: Peter Maydell 
>

Thanks!

Warner

[PATCH v2 8/8] ppc/pnv: rename pnv_pec_stk_update_map()

This function does not use 'stack' anymore. Rename it to
pnv_pec_phb_update_map().

Reviewed-by: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/pci-host/pnv_phb4.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index 3dc3c70cb2..1db815b1ab 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -914,7 +914,7 @@ static void pnv_phb4_update_regions(PnvPHB4 *phb)
 pnv_phb4_check_all_mbt(phb);
 }
 
-static void pnv_pec_stk_update_map(PnvPHB4 *phb)
+static void pnv_pec_phb_update_map(PnvPHB4 *phb)
 {
 PnvPhb4PecState *pec = phb->pec;
 MemoryRegion *sysmem = get_system_memory();
@@ -1066,7 +1066,7 @@ static void pnv_pec_stk_nest_xscom_write(void *opaque, 
hwaddr addr,
 break;
 case PEC_NEST_STK_BAR_EN:
 phb->nest_regs[reg] = val & 0xf000ull;
-pnv_pec_stk_update_map(phb);
+pnv_pec_phb_update_map(phb);
 break;
 case PEC_NEST_STK_DATA_FRZ_TYPE:
 case PEC_NEST_STK_PBCQ_TUN_BAR:
-- 
2.33.1

Re: [PATCH v3 06/19] block: intoduce reqlist


On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:

Split intersecting-requests functionality out of block-copy to be
reused in copy-before-write filter.

Note: while being here, fix tiny typo in MAINTAINERS.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/reqlist.h |  67 +++
  block/block-copy.c  | 116 +---
  block/reqlist.c |  76 ++
  MAINTAINERS |   4 +-
  block/meson.build   |   1 +
  5 files changed, 184 insertions(+), 80 deletions(-)
  create mode 100644 include/block/reqlist.h
  create mode 100644 block/reqlist.c


Looks good to me, this split makes sense.

I have just minor comments (50 % about pre-existing things) below.


diff --git a/include/block/reqlist.h b/include/block/reqlist.h
new file mode 100644
index 00..b904d80216
--- /dev/null
+++ b/include/block/reqlist.h
@@ -0,0 +1,67 @@
+/*
+ * reqlist API
+ *
+ * Copyright (C) 2013 Proxmox Server Solutions
+ * Copyright (c) 2021 Virtuozzo International GmbH.
+ *
+ * Authors:
+ *  Dietmar Maurer (diet...@proxmox.com)
+ *  Vladimir Sementsov-Ogievskiy 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef REQLIST_H
+#define REQLIST_H
+
+#include "qemu/coroutine.h"
+
+/*
+ * The API is not thread-safe and shouldn't be. The struct is public to be part
+ * of other structures and protected by third-party locks, see
+ * block/block-copy.c for example.
+ */
+
+typedef struct BlockReq {
+int64_t offset;
+int64_t bytes;
+
+CoQueue wait_queue; /* coroutines blocked on this req */
+QLIST_ENTRY(BlockReq) list;
+} BlockReq;
+
+typedef QLIST_HEAD(, BlockReq) BlockReqList;
+
+/*
+ * Initialize new request and add it to the list. Caller should be sure that


I’d say s/should/must/, because that is guarded by an assertion.


+ * there are no conflicting requests in the list.
+ */
+void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
+  int64_t bytes);
+/* Search for request in the list intersecting with @offset/@bytes area. */
+BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
+int64_t bytes);
+
+/*
+ * If there are no intersecting requests return false. Otherwise, wait for the
+ * first found intersecting request to finish and return true.
+ *
+ * @lock is passed to qemu_co_queue_wait()
+ * False return value proves that lock was NOT released.


I’d say “was released at no point” instead, because when first reading 
this I understood it to mean that lock simply is locked when this 
function returns, and so I thought that this implies that when `true` is 
returned, the lock is released (and remains released).



+ */
+bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
+   int64_t bytes, CoMutex *lock);
+
+/*
+ * Shrink request and wake all waiting coroutines (may be some of them are not


s/may be/maybe/


+ * intersecting with shrunk request).
+ */
+void coroutine_fn reqlist_shrink_req(BlockReq *req, int64_t new_bytes);
+
+/*
+ * Remove request and wake all waiting coroutines. Do not release any memory.
+ */
+void coroutine_fn reqlist_remove_req(BlockReq *req);
+
+#endif /* REQLIST_H */



diff --git a/block/reqlist.c b/block/reqlist.c
new file mode 100644
index 00..5e320ba649
--- /dev/null
+++ b/block/reqlist.c
@@ -0,0 +1,76 @@


[...]


+BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
+int64_t bytes)
+{
+BlockReq *r;
+
+QLIST_FOREACH(r, reqs, list) {
+if (offset + bytes > r->offset && offset < r->offset + r->bytes) {


(Late, I know, the old code was exactly this, but:) Why not use 
ranges_overlap()?



+return r;
+}
+}
+
+return NULL;
+}

[PATCH v2 6/8] ppc/pnv: make PECs create and realize PHB4s

This patch changes the design of the PEC device to create and realize PHB4s
instead of PecStacks. After all the recent changes, PHB4s now contain all
the information needed for their proper functioning, not relying on PecStack
in any capacity.

All changes are being made in a single patch to avoid renaming parts of
the PecState and leaving the code in a strange way. E.g. rename
PecClass->num_stacks to num_phbs, which would then read a
pnv_pec_num_stacks[] array. To avoid mixing the old and new design more
than necessary it's clearer to do these changes in a single step.

The name changes made are:

- in PnvPhb4PecState:
  * rename 'num_stacks' to 'num_phbs'
  * remove the pec->stacks[] array. Current code relies on the
pec->stacks[] obj acting as a simple container, without ever accessing
pec->stacks[] for any other purpose. Instead of converting this into a
pec->phbs[] array, remove it

- in PnvPhb4PecClass, rename *num_stacks to *num_phbs;

- pnv_pec_num_stacks[] is renamed to pnv_pec_num_phbs[].

The logical changes:

- pnv_pec_default_phb_realize():
  * init and set the properties of the PnvPHB4 qdev
  * do not use stack->phb anymore;

- pnv_pec_realize():
  * use the new default_phb_realize() to init/realize each PHB if
running with defaults;

- pnv_pec_instance_init(): removed since we're creating the PHBs during
pec_realize();

- pnv_phb4_get_stack():
  * renamed to pnv_phb4_get_pec() and returns a PnvPhb4PecState*;

- pnv_phb4_realize(): use 'phb->pec' instead of 'stack'.

This design change shouldn't caused any behavioral change in the runtime
of the machine.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/pci-host/pnv_phb4.c | 26 ++
 hw/pci-host/pnv_phb4_pec.c | 66 ++
 include/hw/pci-host/pnv_phb4.h |  8 ++---
 3 files changed, 31 insertions(+), 69 deletions(-)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index 2efd34518e..3dc3c70cb2 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -884,7 +884,7 @@ static int pnv_phb4_get_phb_stack_no(PnvPHB4 *phb)
 int stack_no = phb->phb_id;
 
 while (index--) {
-stack_no -= pecc->num_stacks[index];
+stack_no -= pecc->num_phbs[index];
 }
 
 return stack_no;
@@ -1383,7 +1383,7 @@ int pnv_phb4_pec_get_phb_id(PnvPhb4PecState *pec, int 
stack_index)
 int offset = 0;
 
 while (index--) {
-offset += pecc->num_stacks[index];
+offset += pecc->num_phbs[index];
 }
 
 return offset + stack_index;
@@ -1534,8 +1534,8 @@ static void pnv_phb4_instance_init(Object *obj)
 object_initialize_child(obj, "source", &phb->xsrc, TYPE_XIVE_SOURCE);
 }
 
-static PnvPhb4PecStack *pnv_phb4_get_stack(PnvChip *chip, PnvPHB4 *phb,
-   Error **errp)
+static PnvPhb4PecState *pnv_phb4_get_pec(PnvChip *chip, PnvPHB4 *phb,
+ Error **errp)
 {
 Pnv9Chip *chip9 = PNV9_CHIP(chip);
 int chip_id = phb->chip_id;
@@ -1544,14 +1544,14 @@ static PnvPhb4PecStack *pnv_phb4_get_stack(PnvChip 
*chip, PnvPHB4 *phb,
 
 for (i = 0; i < chip->num_pecs; i++) {
 /*
- * For each PEC, check the amount of stacks it supports
- * and see if the given phb4 index matches a stack.
+ * For each PEC, check the amount of phbs it supports
+ * and see if the given phb4 index matches an index.
  */
 PnvPhb4PecState *pec = &chip9->pecs[i];
 
-for (j = 0; j < pec->num_stacks; j++) {
+for (j = 0; j < pec->num_phbs; j++) {
 if (index == pnv_phb4_pec_get_phb_id(pec, j)) {
-return &pec->stacks[j];
+return pec;
 }
 }
 }
@@ -1576,7 +1576,6 @@ static void pnv_phb4_realize(DeviceState *dev, Error 
**errp)
 if (!phb->pec) {
 PnvMachineState *pnv = PNV_MACHINE(qdev_get_machine());
 PnvChip *chip = pnv_get_chip(pnv, phb->chip_id);
-PnvPhb4PecStack *stack;
 PnvPhb4PecClass *pecc;
 BusState *s;
 
@@ -1585,18 +1584,13 @@ static void pnv_phb4_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-stack = pnv_phb4_get_stack(chip, phb, &local_err);
+phb->pec = pnv_phb4_get_pec(chip, phb, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
 }
 
-/*
- * All other phb properties but 'pec' ad 'version' are
- * already set.
- */
-object_property_set_link(OBJECT(phb), "pec", OBJECT(stack->pec),
- &error_abort);
+/* All other phb properties are already set */
 pecc = PNV_PHB4_PEC_GET_CLASS(phb->pec);
 object_property_set_int(OBJECT(phb), "version", pecc->version,
 &error_fatal);
diff --git a/hw/pci-host/pnv_phb4_pec.c b/hw/pci-host/pnv_phb4_pec.c
index d6405d6ca3..852816b9f8 100644
--- a/hw/pci-host/pnv

[PATCH v2 2/8] ppc/pnv: reduce stack->stack_no usage

'stack->stack_no' represents the order that a stack appears in its PEC.
Its primary use is in XSCOM address space calculation in
pnv_phb4_xscom_realize() when calculating the memory region offset.

This attribute is redundant with phb->phb_id, which is calculated via
pnv_phb4_pec_get_phb_id() using stack->stack_no information. It'll also
be awkward to assign it when dealing with PECs and PHBs only in a future
patch.

A new pnv_phb4_get_phb_stack_no() helper is introduced to eliminate most
of the stack->stack_no uses we have. The only use left after this patch
is during pnv_pec_stk_default_phb_realize() when calculating phb_id,
which will also handled in the next patches.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/pci-host/pnv_phb4.c | 46 +++---
 1 file changed, 34 insertions(+), 12 deletions(-)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index 2658ef2d84..4933fe57fe 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -868,6 +868,28 @@ static uint64_t pnv_pec_stk_nest_xscom_read(void *opaque, 
hwaddr addr,
 return phb->nest_regs[reg];
 }
 
+/*
+ * Return the 'stack_no' of a PHB4. 'stack_no' is the order
+ * the PHB4 occupies in the PEC. This is the reverse of what
+ * pnv_phb4_pec_get_phb_id() does.
+ *
+ * E.g. a phb with phb_id = 4 and pec->index = 1 (PEC1) will
+ * be the second phb (stack_no = 1) of the PEC.
+ */
+static int pnv_phb4_get_phb_stack_no(PnvPHB4 *phb)
+{
+PnvPhb4PecState *pec = phb->pec;
+PnvPhb4PecClass *pecc = PNV_PHB4_PEC_GET_CLASS(pec);
+int index = pec->index;
+int stack_no = phb->phb_id;
+
+while (index--) {
+stack_no -= pecc->num_stacks[index];
+}
+
+return stack_no;
+}
+
 static void pnv_phb4_update_regions(PnvPHB4 *phb)
 {
 /* Unmap first always */
@@ -894,10 +916,10 @@ static void pnv_phb4_update_regions(PnvPHB4 *phb)
 
 static void pnv_pec_stk_update_map(PnvPHB4 *phb)
 {
-PnvPhb4PecStack *stack = phb->stack;
 PnvPhb4PecState *pec = phb->pec;
 MemoryRegion *sysmem = get_system_memory();
 uint64_t bar_en = phb->nest_regs[PEC_NEST_STK_BAR_EN];
+int stack_no = pnv_phb4_get_phb_stack_no(phb);
 uint64_t bar, mask, size;
 char name[64];
 
@@ -937,7 +959,7 @@ static void pnv_pec_stk_update_map(PnvPHB4 *phb)
 mask = phb->nest_regs[PEC_NEST_STK_MMIO_BAR0_MASK];
 size = ((~mask) >> 8) + 1;
 snprintf(name, sizeof(name), "pec-%d.%d-phb-%d-mmio0",
- pec->chip_id, pec->index, stack->stack_no);
+ pec->chip_id, pec->index, stack_no);
 memory_region_init(&phb->mmbar0, OBJECT(phb), name, size);
 memory_region_add_subregion(sysmem, bar, &phb->mmbar0);
 phb->mmio0_base = bar;
@@ -949,7 +971,7 @@ static void pnv_pec_stk_update_map(PnvPHB4 *phb)
 mask = phb->nest_regs[PEC_NEST_STK_MMIO_BAR1_MASK];
 size = ((~mask) >> 8) + 1;
 snprintf(name, sizeof(name), "pec-%d.%d-phb-%d-mmio1",
- pec->chip_id, pec->index, stack->stack_no);
+ pec->chip_id, pec->index, stack_no);
 memory_region_init(&phb->mmbar1, OBJECT(phb), name, size);
 memory_region_add_subregion(sysmem, bar, &phb->mmbar1);
 phb->mmio1_base = bar;
@@ -960,7 +982,7 @@ static void pnv_pec_stk_update_map(PnvPHB4 *phb)
 bar = phb->nest_regs[PEC_NEST_STK_PHB_REGS_BAR] >> 8;
 size = PNV_PHB4_NUM_REGS << 3;
 snprintf(name, sizeof(name), "pec-%d.%d-phb-%d",
- pec->chip_id, pec->index, stack->stack_no);
+ pec->chip_id, pec->index, stack_no);
 memory_region_init(&phb->phbbar, OBJECT(phb), name, size);
 memory_region_add_subregion(sysmem, bar, &phb->phbbar);
 }
@@ -969,7 +991,7 @@ static void pnv_pec_stk_update_map(PnvPHB4 *phb)
 bar = phb->nest_regs[PEC_NEST_STK_INT_BAR] >> 8;
 size = PNV_PHB4_MAX_INTs << 16;
 snprintf(name, sizeof(name), "pec-%d.%d-phb-%d-int",
- phb->pec->chip_id, phb->pec->index, stack->stack_no);
+ phb->pec->chip_id, phb->pec->index, stack_no);
 memory_region_init(&phb->intbar, OBJECT(phb), name, size);
 memory_region_add_subregion(sysmem, bar, &phb->intbar);
 }
@@ -1458,9 +1480,9 @@ static AddressSpace *pnv_phb4_dma_iommu(PCIBus *bus, void 
*opaque, int devfn)
 
 static void pnv_phb4_xscom_realize(PnvPHB4 *phb)
 {
-PnvPhb4PecStack *stack = phb->stack;
 PnvPhb4PecState *pec = phb->pec;
 PnvPhb4PecClass *pecc = PNV_PHB4_PEC_GET_CLASS(pec);
+int stack_no = pnv_phb4_get_phb_stack_no(phb);
 uint32_t pec_nest_base;
 uint32_t pec_pci_base;
 char name[64];
@@ -1469,20 +1491,20 @@ static void pnv_phb4_xscom_realize(PnvPHB4 *phb)
 
 /* Initialize the XSCOM regions for the stack registers */
 snprintf(name, sizeof(name), "xscom-pec-%d.%d-nest-phb-%d",
- pec->chip_id, pec->index, stack->stack_no);
+ pec->chip_id, pec->index, stack_no);

Re: [PATCH 1/1] virtio: fix the condition for iommu_platform not supported

2022-01-14 Thread Michael S. Tsirkin

On Fri, Jan 14, 2022 at 05:05:56PM +0100, Halil Pasic wrote:
> On Thu, 13 Jan 2022 20:54:52 +0100
> Halil Pasic  wrote:
> 
> > > > This is the very reason for which commit 7ef7e6e3b ("vhost: correctly
> > > > turn on VIRTIO_F_IOMMU_PLATFORM") for, which fences _F_ACCESS_PLATFORM
> > > > form the vhost device that does not need it, because on the vhost
> > > > interface it only means "I/O address translation is needed".
> > > > 
> > > > This patch takes inspiration from 7ef7e6e3b ("vhost: correctly turn on
> > > > VIRTIO_F_IOMMU_PLATFORM"),
> > > 
> > > Strange, I could not find this commit. Did you mean f7ef7e6e3b?
> > >   
> > 
> > Right! Copy-paste error.
> > 
> > 
> 
> Should I spin a v2 to correct this?
> 
> 
> Sorry for the hunk below. I wanted to post the  whole patch in question,
> then deleted it, but left some leftovers. Another copy-paste error. Grrr

Yes pls.

> >  
> >  static void *vhost_memory_map(struct vhost_dev *dev, hwaddr addr,
> > @@ -765,6 +772,9 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
> >  if (enable_log) {
> >  features |= 0x1ULL << VHOST_F_LOG_ALL;
> >  }
> > +if (!vhost_dev_has_iommu(dev)) {
> > +features &= ~(0x1ULL << VIRTIO_F_IOMMU_PLATFORM);
> > +}
> >  r = dev->vhost_ops->vhost_set_features(dev, features);
> >  if (r < 0) {
> >  VHOST_OPS_DEBUG("vhost_set_features failed");
> > 
> > > > and uses the same condition for detecting the

[PATCH v2 5/8] ppc/pnv: remove PnvPhb4PecStack::stack_no

pnv_pec_default_phb_realize() stopped using it after the previous patch and
no one else is using it.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/pci-host/pnv_phb4_pec.c | 2 --
 include/hw/pci-host/pnv_phb4.h | 3 ---
 2 files changed, 5 deletions(-)

diff --git a/hw/pci-host/pnv_phb4_pec.c b/hw/pci-host/pnv_phb4_pec.c
index a80a21db77..d6405d6ca3 100644
--- a/hw/pci-host/pnv_phb4_pec.c
+++ b/hw/pci-host/pnv_phb4_pec.c
@@ -166,7 +166,6 @@ static void pnv_pec_realize(DeviceState *dev, Error **errp)
 PnvPhb4PecStack *stack = &pec->stacks[i];
 Object *stk_obj = OBJECT(stack);
 
-object_property_set_int(stk_obj, "stack-no", i, &error_abort);
 object_property_set_link(stk_obj, "pec", OBJECT(pec), &error_abort);
 
 if (defaults_enabled()) {
@@ -314,7 +313,6 @@ static void pnv_pec_stk_realize(DeviceState *dev, Error 
**errp)
 }
 
 static Property pnv_pec_stk_properties[] = {
-DEFINE_PROP_UINT32("stack-no", PnvPhb4PecStack, stack_no, 0),
 DEFINE_PROP_LINK("pec", PnvPhb4PecStack, pec, TYPE_PNV_PHB4_PEC,
  PnvPhb4PecState *),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/include/hw/pci-host/pnv_phb4.h b/include/hw/pci-host/pnv_phb4.h
index a9059b7279..2be56b7afd 100644
--- a/include/hw/pci-host/pnv_phb4.h
+++ b/include/hw/pci-host/pnv_phb4.h
@@ -171,9 +171,6 @@ OBJECT_DECLARE_SIMPLE_TYPE(PnvPhb4PecStack, 
PNV_PHB4_PEC_STACK)
 struct PnvPhb4PecStack {
 DeviceState parent;
 
-/* My own stack number */
-uint32_t stack_no;
-
 /* The owner PEC */
 PnvPhb4PecState *pec;
 
-- 
2.33.1

[PATCH v2 1/8] ppc/pnv: introduce PnvPHB4 'pec' property

This property will track the owner PEC of this PHB. For now it's
redundant since we can retrieve the PEC via phb->stack->pec but it
will not be redundant when we get rid of the stack device.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/pci-host/pnv_phb4.c | 19 +--
 hw/pci-host/pnv_phb4_pec.c |  2 ++
 include/hw/pci-host/pnv_phb4.h |  3 +++
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index b5045fca64..2658ef2d84 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -895,7 +895,7 @@ static void pnv_phb4_update_regions(PnvPHB4 *phb)
 static void pnv_pec_stk_update_map(PnvPHB4 *phb)
 {
 PnvPhb4PecStack *stack = phb->stack;
-PnvPhb4PecState *pec = stack->pec;
+PnvPhb4PecState *pec = phb->pec;
 MemoryRegion *sysmem = get_system_memory();
 uint64_t bar_en = phb->nest_regs[PEC_NEST_STK_BAR_EN];
 uint64_t bar, mask, size;
@@ -969,7 +969,7 @@ static void pnv_pec_stk_update_map(PnvPHB4 *phb)
 bar = phb->nest_regs[PEC_NEST_STK_INT_BAR] >> 8;
 size = PNV_PHB4_MAX_INTs << 16;
 snprintf(name, sizeof(name), "pec-%d.%d-phb-%d-int",
- stack->pec->chip_id, stack->pec->index, stack->stack_no);
+ phb->pec->chip_id, phb->pec->index, stack->stack_no);
 memory_region_init(&phb->intbar, OBJECT(phb), name, size);
 memory_region_add_subregion(sysmem, bar, &phb->intbar);
 }
@@ -982,7 +982,7 @@ static void pnv_pec_stk_nest_xscom_write(void *opaque, 
hwaddr addr,
  uint64_t val, unsigned size)
 {
 PnvPHB4 *phb = PNV_PHB4(opaque);
-PnvPhb4PecState *pec = phb->stack->pec;
+PnvPhb4PecState *pec = phb->pec;
 uint32_t reg = addr >> 3;
 
 switch (reg) {
@@ -1459,7 +1459,7 @@ static AddressSpace *pnv_phb4_dma_iommu(PCIBus *bus, void 
*opaque, int devfn)
 static void pnv_phb4_xscom_realize(PnvPHB4 *phb)
 {
 PnvPhb4PecStack *stack = phb->stack;
-PnvPhb4PecState *pec = stack->pec;
+PnvPhb4PecState *pec = phb->pec;
 PnvPhb4PecClass *pecc = PNV_PHB4_PEC_GET_CLASS(pec);
 uint32_t pec_nest_base;
 uint32_t pec_pci_base;
@@ -1568,8 +1568,13 @@ static void pnv_phb4_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-/* All other phb properties but 'version' are already set */
-pecc = PNV_PHB4_PEC_GET_CLASS(phb->stack->pec);
+/*
+ * All other phb properties but 'pec' ad 'version' are
+ * already set.
+ */
+object_property_set_link(OBJECT(phb), "pec", OBJECT(phb->stack->pec),
+ &error_abort);
+pecc = PNV_PHB4_PEC_GET_CLASS(phb->pec);
 object_property_set_int(OBJECT(phb), "version", pecc->version,
 &error_fatal);
 
@@ -1682,6 +1687,8 @@ static Property pnv_phb4_properties[] = {
 DEFINE_PROP_UINT64("version", PnvPHB4, version, 0),
 DEFINE_PROP_LINK("stack", PnvPHB4, stack, TYPE_PNV_PHB4_PEC_STACK,
  PnvPhb4PecStack *),
+DEFINE_PROP_LINK("pec", PnvPHB4, pec, TYPE_PNV_PHB4_PEC,
+ PnvPhb4PecState *),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/pci-host/pnv_phb4_pec.c b/hw/pci-host/pnv_phb4_pec.c
index 7fe7f1f007..22194b8de2 100644
--- a/hw/pci-host/pnv_phb4_pec.c
+++ b/hw/pci-host/pnv_phb4_pec.c
@@ -285,6 +285,8 @@ static void pnv_pec_stk_default_phb_realize(PnvPhb4PecStack 
*stack,
 
 stack->phb = PNV_PHB4(qdev_new(TYPE_PNV_PHB4));
 
+object_property_set_link(OBJECT(stack->phb), "pec", OBJECT(pec),
+ &error_abort);
 object_property_set_int(OBJECT(stack->phb), "chip-id", pec->chip_id,
 &error_fatal);
 object_property_set_int(OBJECT(stack->phb), "index", phb_id,
diff --git a/include/hw/pci-host/pnv_phb4.h b/include/hw/pci-host/pnv_phb4.h
index 6968efaba8..1d27e4c0cb 100644
--- a/include/hw/pci-host/pnv_phb4.h
+++ b/include/hw/pci-host/pnv_phb4.h
@@ -84,6 +84,9 @@ struct PnvPHB4 {
 
 uint64_t version;
 
+/* The owner PEC */
+PnvPhb4PecState *pec;
+
 char bus_path[8];
 
 /* Main register images */
-- 
2.33.1

[PATCH v2 3/8] ppc/pnv: remove stack pointer from PnvPHB4

This pointer was being used for two reasons: pnv_phb4_update_regions()
was using it to access the PHB and phb4_realize() was using it as a way
to determine if the PHB was user created.

We can determine if the PHB is user created via phb->pec, introduced in
the previous patch, and pnv_phb4_update_regions() is no longer using
stack->phb.

Remove the pointer from the PnvPHB4 device.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/pci-host/pnv_phb4.c | 15 ---
 hw/pci-host/pnv_phb4_pec.c |  2 --
 include/hw/pci-host/pnv_phb4.h |  2 --
 3 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index 4933fe57fe..2efd34518e 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -1573,9 +1573,10 @@ static void pnv_phb4_realize(DeviceState *dev, Error 
**errp)
 char name[32];
 
 /* User created PHB */
-if (!phb->stack) {
+if (!phb->pec) {
 PnvMachineState *pnv = PNV_MACHINE(qdev_get_machine());
 PnvChip *chip = pnv_get_chip(pnv, phb->chip_id);
+PnvPhb4PecStack *stack;
 PnvPhb4PecClass *pecc;
 BusState *s;
 
@@ -1584,7 +1585,7 @@ static void pnv_phb4_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-phb->stack = pnv_phb4_get_stack(chip, phb, &local_err);
+stack = pnv_phb4_get_stack(chip, phb, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -1594,18 +1595,12 @@ static void pnv_phb4_realize(DeviceState *dev, Error 
**errp)
  * All other phb properties but 'pec' ad 'version' are
  * already set.
  */
-object_property_set_link(OBJECT(phb), "pec", OBJECT(phb->stack->pec),
+object_property_set_link(OBJECT(phb), "pec", OBJECT(stack->pec),
  &error_abort);
 pecc = PNV_PHB4_PEC_GET_CLASS(phb->pec);
 object_property_set_int(OBJECT(phb), "version", pecc->version,
 &error_fatal);
 
-/*
- * Assign stack->phb since pnv_phb4_update_regions() uses it
- * to access the phb.
- */
-phb->stack->phb = phb;
-
 /*
  * Reparent user created devices to the chip to build
  * correctly the device tree.
@@ -1707,8 +1702,6 @@ static Property pnv_phb4_properties[] = {
 DEFINE_PROP_UINT32("index", PnvPHB4, phb_id, 0),
 DEFINE_PROP_UINT32("chip-id", PnvPHB4, chip_id, 0),
 DEFINE_PROP_UINT64("version", PnvPHB4, version, 0),
-DEFINE_PROP_LINK("stack", PnvPHB4, stack, TYPE_PNV_PHB4_PEC_STACK,
- PnvPhb4PecStack *),
 DEFINE_PROP_LINK("pec", PnvPHB4, pec, TYPE_PNV_PHB4_PEC,
  PnvPhb4PecState *),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/pci-host/pnv_phb4_pec.c b/hw/pci-host/pnv_phb4_pec.c
index 22194b8de2..ed1d644182 100644
--- a/hw/pci-host/pnv_phb4_pec.c
+++ b/hw/pci-host/pnv_phb4_pec.c
@@ -293,8 +293,6 @@ static void pnv_pec_stk_default_phb_realize(PnvPhb4PecStack 
*stack,
 &error_fatal);
 object_property_set_int(OBJECT(stack->phb), "version", pecc->version,
 &error_fatal);
-object_property_set_link(OBJECT(stack->phb), "stack", OBJECT(stack),
- &error_abort);
 
 if (!sysbus_realize(SYS_BUS_DEVICE(stack->phb), errp)) {
 return;
diff --git a/include/hw/pci-host/pnv_phb4.h b/include/hw/pci-host/pnv_phb4.h
index 1d27e4c0cb..a9059b7279 100644
--- a/include/hw/pci-host/pnv_phb4.h
+++ b/include/hw/pci-host/pnv_phb4.h
@@ -151,8 +151,6 @@ struct PnvPHB4 {
 XiveSource xsrc;
 qemu_irq *qirqs;
 
-PnvPhb4PecStack *stack;
-
 QLIST_HEAD(, PnvPhb4DMASpace) dma_spaces;
 };
 
-- 
2.33.1

[PATCH v2 7/8] ppc/pnv: remove PnvPhb4PecStack object

All the complexity that was scattered between PnvPhb4PecStack and
PnvPHB4 are now centered in the PnvPHB4 device. PnvPhb4PecStack does not
serve any purpose in the current code base.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/pci-host/pnv_phb4_pec.c | 33 -
 include/hw/pci-host/pnv_phb4.h | 17 -
 2 files changed, 50 deletions(-)

diff --git a/hw/pci-host/pnv_phb4_pec.c b/hw/pci-host/pnv_phb4_pec.c
index 852816b9f8..12aa459628 100644
--- a/hw/pci-host/pnv_phb4_pec.c
+++ b/hw/pci-host/pnv_phb4_pec.c
@@ -278,42 +278,9 @@ static const TypeInfo pnv_pec_type_info = {
 }
 };
 
-static void pnv_pec_stk_realize(DeviceState *dev, Error **errp)
-{
-}
-
-static Property pnv_pec_stk_properties[] = {
-DEFINE_PROP_LINK("pec", PnvPhb4PecStack, pec, TYPE_PNV_PHB4_PEC,
- PnvPhb4PecState *),
-DEFINE_PROP_END_OF_LIST(),
-};
-
-static void pnv_pec_stk_class_init(ObjectClass *klass, void *data)
-{
-DeviceClass *dc = DEVICE_CLASS(klass);
-
-device_class_set_props(dc, pnv_pec_stk_properties);
-dc->realize = pnv_pec_stk_realize;
-dc->user_creatable = false;
-
-/* TODO: reset regs ? */
-}
-
-static const TypeInfo pnv_pec_stk_type_info = {
-.name  = TYPE_PNV_PHB4_PEC_STACK,
-.parent= TYPE_DEVICE,
-.instance_size = sizeof(PnvPhb4PecStack),
-.class_init= pnv_pec_stk_class_init,
-.interfaces= (InterfaceInfo[]) {
-{ TYPE_PNV_XSCOM_INTERFACE },
-{ }
-}
-};
-
 static void pnv_pec_register_types(void)
 {
 type_register_static(&pnv_pec_type_info);
-type_register_static(&pnv_pec_stk_type_info);
 }
 
 type_init(pnv_pec_register_types);
diff --git a/include/hw/pci-host/pnv_phb4.h b/include/hw/pci-host/pnv_phb4.h
index e750165e77..74fdec2b47 100644
--- a/include/hw/pci-host/pnv_phb4.h
+++ b/include/hw/pci-host/pnv_phb4.h
@@ -164,23 +164,6 @@ extern const MemoryRegionOps pnv_phb4_xscom_ops;
 #define TYPE_PNV_PHB4_PEC "pnv-phb4-pec"
 OBJECT_DECLARE_TYPE(PnvPhb4PecState, PnvPhb4PecClass, PNV_PHB4_PEC)
 
-#define TYPE_PNV_PHB4_PEC_STACK "pnv-phb4-pec-stack"
-OBJECT_DECLARE_SIMPLE_TYPE(PnvPhb4PecStack, PNV_PHB4_PEC_STACK)
-
-/* Per-stack data */
-struct PnvPhb4PecStack {
-DeviceState parent;
-
-/* The owner PEC */
-PnvPhb4PecState *pec;
-
-/*
- * PHB4 pointer. pnv_phb4_update_regions() needs to access
- * the PHB4 via a PnvPhb4PecStack pointer.
- */
-PnvPHB4 *phb;
-};
-
 struct PnvPhb4PecState {
 DeviceState parent;
 
-- 
2.33.1

[PATCH v2 4/8] ppc/pnv: move default_phb_realize() to pec_realize()

Move the current pnv_pec_stk_default_phb_realize() call to
pec_realize(), renaming the function to pnv_pec_default_phb_realize(),
and set the PHB attributes using the PEC object directly.

This will be important to allow for PECs devices to handle PHB4s
directly later on.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/pci-host/pnv_phb4_pec.c | 63 --
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/hw/pci-host/pnv_phb4_pec.c b/hw/pci-host/pnv_phb4_pec.c
index ed1d644182..a80a21db77 100644
--- a/hw/pci-host/pnv_phb4_pec.c
+++ b/hw/pci-host/pnv_phb4_pec.c
@@ -112,6 +112,30 @@ static const MemoryRegionOps pnv_pec_pci_xscom_ops = {
 .endianness = DEVICE_BIG_ENDIAN,
 };
 
+static void pnv_pec_default_phb_realize(PnvPhb4PecStack *stack,
+int stack_no,
+Error **errp)
+{
+PnvPhb4PecState *pec = stack->pec;
+PnvPhb4PecClass *pecc = PNV_PHB4_PEC_GET_CLASS(pec);
+int phb_id = pnv_phb4_pec_get_phb_id(pec, stack_no);
+
+stack->phb = PNV_PHB4(qdev_new(TYPE_PNV_PHB4));
+
+object_property_set_link(OBJECT(stack->phb), "pec", OBJECT(pec),
+ &error_abort);
+object_property_set_int(OBJECT(stack->phb), "chip-id", pec->chip_id,
+&error_fatal);
+object_property_set_int(OBJECT(stack->phb), "index", phb_id,
+&error_fatal);
+object_property_set_int(OBJECT(stack->phb), "version", pecc->version,
+&error_fatal);
+
+if (!sysbus_realize(SYS_BUS_DEVICE(stack->phb), errp)) {
+return;
+}
+}
+
 static void pnv_pec_instance_init(Object *obj)
 {
 PnvPhb4PecState *pec = PNV_PHB4_PEC(obj);
@@ -144,6 +168,15 @@ static void pnv_pec_realize(DeviceState *dev, Error **errp)
 
 object_property_set_int(stk_obj, "stack-no", i, &error_abort);
 object_property_set_link(stk_obj, "pec", OBJECT(pec), &error_abort);
+
+if (defaults_enabled()) {
+pnv_pec_default_phb_realize(stack, i, errp);
+}
+
+/*
+ * qdev gets angry if we don't realize 'stack' here, even
+ * if stk_realize() is now empty.
+ */
 if (!qdev_realize(DEVICE(stk_obj), NULL, errp)) {
 return;
 }
@@ -276,38 +309,8 @@ static const TypeInfo pnv_pec_type_info = {
 }
 };
 
-static void pnv_pec_stk_default_phb_realize(PnvPhb4PecStack *stack,
-Error **errp)
-{
-PnvPhb4PecState *pec = stack->pec;
-PnvPhb4PecClass *pecc = PNV_PHB4_PEC_GET_CLASS(pec);
-int phb_id = pnv_phb4_pec_get_phb_id(pec, stack->stack_no);
-
-stack->phb = PNV_PHB4(qdev_new(TYPE_PNV_PHB4));
-
-object_property_set_link(OBJECT(stack->phb), "pec", OBJECT(pec),
- &error_abort);
-object_property_set_int(OBJECT(stack->phb), "chip-id", pec->chip_id,
-&error_fatal);
-object_property_set_int(OBJECT(stack->phb), "index", phb_id,
-&error_fatal);
-object_property_set_int(OBJECT(stack->phb), "version", pecc->version,
-&error_fatal);
-
-if (!sysbus_realize(SYS_BUS_DEVICE(stack->phb), errp)) {
-return;
-}
-}
-
 static void pnv_pec_stk_realize(DeviceState *dev, Error **errp)
 {
-PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(dev);
-
-if (!defaults_enabled()) {
-return;
-}
-
-pnv_pec_stk_default_phb_realize(stack, errp);
 }
 
 static Property pnv_pec_stk_properties[] = {
-- 
2.33.1

[PATCH v2 0/8] remove PnvPhb4PecStack from Powernv9

Hi,

This second version contains improvements suggested by Cedric in the
v1 review.

Patches 1-10 from v1 are already accepted and aren't included in this
v2.


Changes from v1:
- v1 patches 1-10: already accepted, not included in the v2
- 'stack-stack_no' use is eliminated. We're now deriving stack_no from
phb->phb_id
- no longer use phb->phb_number
- no longer use pec->phbs[]
- v1 link: https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg03000.html

Daniel Henrique Barboza (8):
  ppc/pnv: introduce PnvPHB4 'pec' property
  ppc/pnv: reduce stack->stack_no usage
  ppc/pnv: remove stack pointer from PnvPHB4
  ppc/pnv: move default_phb_realize() to pec_realize()
  ppc/pnv: remove PnvPhb4PecStack::stack_no
  ppc/pnv: make PECs create and realize PHB4s
  ppc/pnv: remove PnvPhb4PecStack object
  ppc/pnv: rename pnv_pec_stk_update_map()

 hw/pci-host/pnv_phb4.c |  88 ++--
 hw/pci-host/pnv_phb4_pec.c | 118 -
 include/hw/pci-host/pnv_phb4.h |  33 ++---
 3 files changed, 86 insertions(+), 153 deletions(-)

-- 
2.33.1

Re: [PATCH 2/2] hw/i386: support loading OVMF using -bios too

2022-01-14 Thread Michael S. Tsirkin

On Thu, Jan 13, 2022 at 04:55:11PM +, Daniel P. Berrangé wrote:
> Traditionally the OVMF firmware has been loaded using the pflash
> mechanism. This is because it is usually provided as a pair of
> files, one read-only containing the code and one writable to
> provided persistence of non-volatile firmware variables.
> 
> The AMD SEV build of EDK2, however, is provided as a single
> file that contains only the code. This is intended to be used
> read-only and explicitly does not provide any ability for
> persistance of non-volatile firmware variables. While it is
> possible to configure this with the pflash mechanism, by only
> providing one of the 2 pflash blobs, conceptually it is a
> little strange to use pflash if there won't be any persistent
> data.
> 
> A stateless OVMF build can be loaded with -bios, however, QEMU
> does not currently initialize SEV in that scenario. This patch
> introduces the call needed for SEV initialization of the
> firmware.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  hw/i386/x86.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index b84840a1bb..c79d84936f 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -45,6 +45,7 @@
>  #include "target/i386/cpu.h"
>  #include "hw/i386/topology.h"
>  #include "hw/i386/fw_cfg.h"
> +#include "hw/i386/pc.h"
>  #include "hw/intc/i8259.h"
>  #include "hw/rtc/mc146818rtc.h"
>  #include "target/i386/sev.h"

This builds fine because there's a stub in pc_sysfw_ovmf-stubs.c

The unfortunate thing about this however is that it's too easy to pull
in a PC dependency, and people building with CONFIG_PC will not notice
until it breaks for others.

Is it time we split pc.h further and had pc_sysfw_ovmf.h ?

> @@ -1157,6 +1158,10 @@ void x86_bios_rom_init(MachineState *ms, const char 
> *default_firmware,
>  memory_region_add_subregion(rom_memory,
>  (uint32_t)(-bios_size),
>  bios);
> +
> +pc_system_ovmf_initialize_sev(
> +rom_ptr((uint32_t)-bios_size, bios_size),
> +bios_size);

Just curious about the formatting here:

pc_system_ovmf_initialize_sev(rom_ptr((uint32_t)-bios_size, bios_size),
  bios_size);

would be prettier ...

>  }
>  
>  bool x86_machine_is_smm_enabled(const X86MachineState *x86ms)
> -- 
> 2.33.1

Re: [PATCH v3 05/19] block/block-copy: add block_copy_reset()


On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:

Split block_copy_reset() out of block_copy_reset_unallocated() to be
used separately later.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/block-copy.h |  1 +
  block/block-copy.c | 21 +
  2 files changed, 14 insertions(+), 8 deletions(-)


Reviewed-by: Hanna Reitz

Re: [PATCH v3 04/19] block/copy-before-write: add bitmap open parameter


On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:

This brings "incremental" mode to copy-before-write filter: user can
specify bitmap so that filter will copy only "dirty" areas.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  qapi/block-core.json  | 10 +-
  block/copy-before-write.c | 30 +-
  2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1d3dd9cb48..6904daeacf 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4167,11 +4167,19 @@
  #
  # @target: The target for copy-before-write operations.
  #
+# @bitmap: If specified, copy-before-write filter will do
+#  copy-before-write operations only for dirty regions of the
+#  bitmap. Bitmap size must be equal to length of file and
+#  target child of the filter. Note also, that bitmap is used
+#  only to initialize internal bitmap of the process, so further
+#  modifications (or removing) of specified bitmap doesn't
+#  influence the filter.
+#
  # Since: 6.2
  ##
  { 'struct': 'BlockdevOptionsCbw',
'base': 'BlockdevOptionsGenericFormat',
-  'data': { 'target': 'BlockdevRef' } }
+  'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap' } }
  
  ##

  # @BlockdevOptions:
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 799223e3fb..4cd90d22df 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -149,6 +149,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
  Error **errp)
  {
  BDRVCopyBeforeWriteState *s = bs->opaque;
+BdrvDirtyBitmap *bitmap = NULL;
  
  bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,

 BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -163,6 +164,33 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
  return -EINVAL;
  }
  
+if (qdict_haskey(options, "bitmap.node") ||

+qdict_haskey(options, "bitmap.name"))
+{
+const char *bitmap_node, *bitmap_name;
+
+if (!qdict_haskey(options, "bitmap.node")) {
+error_setg(errp, "bitmap.node is not specified");
+return -EINVAL;
+}
+
+if (!qdict_haskey(options, "bitmap.name")) {
+error_setg(errp, "bitmap.name is not specified");
+return -EINVAL;
+}
+
+bitmap_node = qdict_get_str(options, "bitmap.node");
+bitmap_name = qdict_get_str(options, "bitmap.name");
+qdict_del(options, "bitmap.node");
+qdict_del(options, "bitmap.name");


I’m not really a fan of this manual parsing, but I can see nothing 
technically wrong with it.


Still, what do you think of using an input visitor, like:

QDict *bitmap_qdict;

qdict_extract_subqdict(options, &bitmap_qdict, "bitmap.");
if (qdict_size(bitmap_qdict) > 0) {
    BlockDirtyBitmap *bmp_param;
    Visitor *v = qobject_input_visitor_new_flat_confused(bitmap_qdict, 
errp);

    visit_type_BlockDirtyBitmap(v, NULL, &bmp_param, errp);
    visit_free(v);
    qobject_unref(bitmap_qdict);

    bitmap = block_dirty_bitmap_lookup(bmp_param->node, 
bmp_param->name, ...);

    qapi_free_BlockDirtyBitmap(bmp_param);
}

(+ error handling, which is why perhaps the first block should be put 
into a separate function cbw_get_bitmap_param() to simplify error handling)



+
+bitmap = block_dirty_bitmap_lookup(bitmap_node, bitmap_name, NULL,
+   errp);
+if (!bitmap) {
+return -EINVAL;
+}
+}
+
  bs->total_sectors = bs->file->bs->total_sectors;
  bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
  (BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
@@ -170,7 +198,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
  ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
   bs->file->bs->supported_zero_flags);
  
-s->bcs = block_copy_state_new(bs->file, s->target, NULL, errp);

+s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp);
  if (!s->bcs) {
  error_prepend(errp, "Cannot create block-copy-state: ");
  return -EINVAL;

Re: [PATCH v3 03/19] block/block-copy: block_copy_state_new(): add bitmap parameter


On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:

This will be used in the following commit to bring "incremental" mode
to copy-before-write filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/block-copy.h |  2 +-
  block/block-copy.c | 14 --
  block/copy-before-write.c  |  2 +-
  3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 99370fa38b..8da4cec1b6 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -25,7 +25,7 @@ typedef struct BlockCopyState BlockCopyState;
  typedef struct BlockCopyCallState BlockCopyCallState;
  
  BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,

- Error **errp);
+ BdrvDirtyBitmap *bitmap, Error **errp);
  
  /* Function should be called prior any actual copy request */

  void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
diff --git a/block/block-copy.c b/block/block-copy.c
index abda7a80bd..f6345e3a4c 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -384,8 +384,9 @@ static int64_t 
block_copy_calculate_cluster_size(BlockDriverState *target,
  }
  
  BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,

- Error **errp)
+ BdrvDirtyBitmap *bitmap, Error **errp)


Could be `const` to signal that we won’t be using this bitmap for the 
BCS, but given our inconsistent usage of `const`, it isn’t anything 
that’d be important.



  {
+ERRP_GUARD();
  BlockCopyState *s;
  int64_t cluster_size;
  BdrvDirtyBitmap *copy_bitmap;
@@ -402,7 +403,16 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
  return NULL;
  }
  bdrv_disable_dirty_bitmap(copy_bitmap);
-bdrv_set_dirty_bitmap(copy_bitmap, 0, bdrv_dirty_bitmap_size(copy_bitmap));
+if (bitmap) {
+if (!bdrv_merge_dirty_bitmap(copy_bitmap, bitmap, NULL, errp)) {
+error_prepend(errp, "Failed to merge bitmap '%s' to internal "
+  "copy-bitmap: ", bdrv_dirty_bitmap_name(bitmap));
+return NULL;


What might be Should we free `copy_bitmap` here?

(Apart from this, looks good to me!)


+}
+} else {
+bdrv_set_dirty_bitmap(copy_bitmap, 0,
+  bdrv_dirty_bitmap_size(copy_bitmap));
+}
  
  /*

   * If source is in backing chain of target assume that target is going to 
be
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 5bdaf0a9d9..799223e3fb 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -170,7 +170,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
  ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
   bs->file->bs->supported_zero_flags);
  
-s->bcs = block_copy_state_new(bs->file, s->target, errp);

+s->bcs = block_copy_state_new(bs->file, s->target, NULL, errp);
  if (!s->bcs) {
  error_prepend(errp, "Cannot create block-copy-state: ");
  return -EINVAL;

Re: [PATCH v3 01/19] block/block-copy: move copy_bitmap initialization to block_copy_state_new()


On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:

We are going to complicate bitmap initialization in the further
commit. And in future, backup job will be able to work without filter
(when source is immutable), so we'll need same bitmap initialization in
copy-before-write filter and in backup job. So, it's reasonable to do
it in block-copy.

Note that for now cbw_open() is the only caller of
block_copy_state_new().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/block-copy.c| 1 +
  block/copy-before-write.c | 4 
  2 files changed, 1 insertion(+), 4 deletions(-)


Reviewed-by: Hanna Reitz

Re: [PATCH v3 02/19] block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value


On 22.12.21 18:40, Vladimir Sementsov-Ogievskiy wrote:

That simplifies handling failure in existing code and in further new
usage of bdrv_merge_dirty_bitmap().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/dirty-bitmap.h| 2 +-
  block/dirty-bitmap.c| 9 +++--
  block/monitor/bitmap-qmp-cmds.c | 5 +
  3 files changed, 9 insertions(+), 7 deletions(-)


Reviewed-by: Hanna Reitz

Re: [PATCH 1/2] accel/tcg: Optimize jump cache flush during tlb range flush

2022-01-14 Thread Idan Horowitz

Alex Bennée  wrote:
>
>
> For multi-patch series please include a cover letter which is the parent
> of all the patches. This is the default for git-send-email.
>

Sorry, I will do so from now on.

>
> The code itself looks fine but what sort of improvements are we talking
> about here? What measurements have you taken and under what conditions?
>

Execution time of the following assembly snippet in TCG
aarch64-softmmu with icount (shift=10) enabled decreased from 4
minutes and 53 seconds to below 1 second:

movkx0, #0x0
movkx0, #0x0, lsl #16
movkx0, #0xff80, lsl #32
movkx0, #0x0, lsl #48
mov  x9, #0x64
str x9, [x8]
tlbirvae1, x0
ldr x9, [x8]
sub   x9, x9, #0x1
str x9, [x8]
cbnz x9, 0x5168abc8

>
> --
> Alex Bennée

Idan Horowitz

Re: [PATCH 1/1] virtio: fix the condition for iommu_platform not supported

2022-01-14 Thread Halil Pasic

On Thu, 13 Jan 2022 20:54:52 +0100
Halil Pasic  wrote:

> > > This is the very reason for which commit 7ef7e6e3b ("vhost: correctly
> > > turn on VIRTIO_F_IOMMU_PLATFORM") for, which fences _F_ACCESS_PLATFORM
> > > form the vhost device that does not need it, because on the vhost
> > > interface it only means "I/O address translation is needed".
> > > 
> > > This patch takes inspiration from 7ef7e6e3b ("vhost: correctly turn on
> > > VIRTIO_F_IOMMU_PLATFORM"),
> > 
> > Strange, I could not find this commit. Did you mean f7ef7e6e3b?
> >   
> 
> Right! Copy-paste error.
> 
> 

Should I spin a v2 to correct this?


Sorry for the hunk below. I wanted to post the  whole patch in question,
then deleted it, but left some leftovers. Another copy-paste error. Grrr

>  
>  static void *vhost_memory_map(struct vhost_dev *dev, hwaddr addr,
> @@ -765,6 +772,9 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
>  if (enable_log) {
>  features |= 0x1ULL << VHOST_F_LOG_ALL;
>  }
> +if (!vhost_dev_has_iommu(dev)) {
> +features &= ~(0x1ULL << VIRTIO_F_IOMMU_PLATFORM);
> +}
>  r = dev->vhost_ops->vhost_set_features(dev, features);
>  if (r < 0) {
>  VHOST_OPS_DEBUG("vhost_set_features failed");
> 
> > > and uses the same condition for detecting the

[PATCH] hw/riscv: spike: Allow using binary firmware as bios

2022-01-14 Thread Anup Patel

Currently, we have to use OpenSBI firmware ELF as bios for the spike
machine because the HTIF console requires ELF for parsing "fromhost"
and "tohost" symbols.

The latest OpenSBI can now optionally pick-up HTIF register address
from HTIF DT node so using this feature spike machine can now use
OpenSBI firmware BIN as bios.

Signed-off-by: Anup Patel 
---
 hw/char/riscv_htif.c | 33 +++--
 hw/riscv/spike.c | 41 ++--
 include/hw/char/riscv_htif.h |  5 -
 include/hw/riscv/spike.h |  1 +
 4 files changed, 52 insertions(+), 28 deletions(-)

diff --git a/hw/char/riscv_htif.c b/hw/char/riscv_htif.c
index ddae738d56..b59d321fb7 100644
--- a/hw/char/riscv_htif.c
+++ b/hw/char/riscv_htif.c
@@ -228,13 +228,25 @@ static const MemoryRegionOps htif_mm_ops = {
 .write = htif_mm_write,
 };
 
+bool htif_uses_elf_symbols(void)
+{
+return (address_symbol_set == 3) ? true : false;
+}
+
 HTIFState *htif_mm_init(MemoryRegion *address_space, MemoryRegion *main_mem,
-CPURISCVState *env, Chardev *chr)
+CPURISCVState *env, Chardev *chr, uint64_t nonelf_base)
 {
-uint64_t base = MIN(tohost_addr, fromhost_addr);
-uint64_t size = MAX(tohost_addr + 8, fromhost_addr + 8) - base;
-uint64_t tohost_offset = tohost_addr - base;
-uint64_t fromhost_offset = fromhost_addr - base;
+uint64_t base, size, tohost_offset, fromhost_offset;
+
+if (address_symbol_set != 3) {
+fromhost_addr = nonelf_base;
+tohost_addr = nonelf_base + 8;
+}
+
+base = MIN(tohost_addr, fromhost_addr);
+size = MAX(tohost_addr + 8, fromhost_addr + 8) - base;
+tohost_offset = tohost_addr - base;
+fromhost_offset = fromhost_addr - base;
 
 HTIFState *s = g_malloc0(sizeof(HTIFState));
 s->address_space = address_space;
@@ -249,12 +261,11 @@ HTIFState *htif_mm_init(MemoryRegion *address_space, 
MemoryRegion *main_mem,
 qemu_chr_fe_init(&s->chr, chr, &error_abort);
 qemu_chr_fe_set_handlers(&s->chr, htif_can_recv, htif_recv, htif_event,
 htif_be_change, s, NULL, true);
-if (address_symbol_set == 3) {
-memory_region_init_io(&s->mmio, NULL, &htif_mm_ops, s,
-  TYPE_HTIF_UART, size);
-memory_region_add_subregion_overlap(address_space, base,
-&s->mmio, 1);
-}
+
+memory_region_init_io(&s->mmio, NULL, &htif_mm_ops, s,
+  TYPE_HTIF_UART, size);
+memory_region_add_subregion_overlap(address_space, base,
+&s->mmio, 1);
 
 return s;
 }
diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
index 288d69cd9f..597df4c288 100644
--- a/hw/riscv/spike.c
+++ b/hw/riscv/spike.c
@@ -42,6 +42,7 @@
 
 static const MemMapEntry spike_memmap[] = {
 [SPIKE_MROM] = { 0x1000, 0xf000 },
+[SPIKE_HTIF] = {  0x100, 0x1000 },
 [SPIKE_CLINT] ={  0x200,0x1 },
 [SPIKE_DRAM] = { 0x8000,0x0 },
 };
@@ -75,6 +76,10 @@ static void create_fdt(SpikeState *s, const MemMapEntry 
*memmap,
 
 qemu_fdt_add_subnode(fdt, "/htif");
 qemu_fdt_setprop_string(fdt, "/htif", "compatible", "ucb,htif0");
+if (!htif_uses_elf_symbols()) {
+qemu_fdt_setprop_cells(fdt, "/htif", "reg",
+0x0, memmap[SPIKE_HTIF].base, 0x0, memmap[SPIKE_HTIF].size);
+}
 
 qemu_fdt_add_subnode(fdt, "/soc");
 qemu_fdt_setprop(fdt, "/soc", "ranges", NULL, 0);
@@ -172,6 +177,7 @@ static void create_fdt(SpikeState *s, const MemMapEntry 
*memmap,
 if (cmdline) {
 qemu_fdt_add_subnode(fdt, "/chosen");
 qemu_fdt_setprop_string(fdt, "/chosen", "bootargs", cmdline);
+qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", "/htif");
 }
 }
 
@@ -241,10 +247,6 @@ static void spike_board_init(MachineState *machine)
 memory_region_add_subregion(system_memory, memmap[SPIKE_DRAM].base,
 machine->ram);
 
-/* create device tree */
-create_fdt(s, memmap, machine->ram_size, machine->kernel_cmdline,
-   riscv_is_32bit(&s->soc[0]));
-
 /* boot rom */
 memory_region_init_rom(mask_rom, NULL, "riscv.spike.mrom",
memmap[SPIKE_MROM].size, &error_fatal);
@@ -266,6 +268,7 @@ static void spike_board_init(MachineState *machine)
 htif_symbol_callback);
 }
 
+/* Load kernel */
 if (machine->kernel_filename) {
 kernel_start_addr = riscv_calc_kernel_start_addr(&s->soc[0],
  firmware_end_addr);
@@ -273,17 +276,6 @@ static void spike_board_init(MachineState *machine)
 kernel_entry = riscv_load_kernel(machine->kernel_filename,
  kernel_start_addr,
  htif_symbol_callback);
-
-if (machine->initrd_filename) {
-hwaddr start;
-

Re: [PATCH 4/6] migration: Add ram-only capability

2022-01-14 Thread Markus Armbruster

Daniel P. Berrangé  writes:

> On Fri, Jan 14, 2022 at 12:22:13PM +0100, Markus Armbruster wrote:
>> Nikita Lapshin  writes:
>> 
>> > If this capability is enabled migration stream
>> > will have RAM section only.
>> >
>> > Signed-off-by: Nikita Lapshin 
>> 
>> [...]
>> 
>> > diff --git a/qapi/migration.json b/qapi/migration.json
>> > index d53956852c..626fc59d14 100644
>> > --- a/qapi/migration.json
>> > +++ b/qapi/migration.json
>> > @@ -454,6 +454,8 @@
>> >  #
>> >  # @no-ram: If enabled, migration stream won't contain any ram in it. 
>> > (since 7.0)
>> >  #
>> > +# @ram-only: If enabled, only RAM sections will be sent. (since 7.0)
>> > +#
>> 
>> What happens when I ask for 'no-ram': true, 'ram-only': true?
>
> So IIUC
>
>   no-ram=false, ram-only=false =>  RAM + vmstate
>   no-ram=true, ram-only=false => vmstate
>   no-ram=false, ram-only=true =>  RAM
>   no-ram=true, ram-only=true => nothing to send ?
>
> I find that the fact that one flag is a negative request and
> the other flag is a positive request to be confusing.

Me too.

> If we must have two flags then could we at least use the same
> style for both. ie either
>
>   @no-ram
>   @no-vmstate
>
> Or
>
>   @ram-only
>   @vmstate-only

I strongly prefer "positive" names for booleans, avoiding double
negation.

> Since the code enforces these flags are mutually exclusive
> though, it might point towards being handled by a enum
>
>   { 'enum': 'MigrationStreamContent',
> 'data': ['both', 'ram', 'vmstate'] }

Enumerating the combinations that are actually valid is often nicer than
a set of flags that look independent, but aren't.

MigrationCapability can only do flags.  For an enum, we'd have to use
MigrationParameters & friends.  For an example, check out
@multifd-compression there.

> none of these approaches are especially future proof if we ever
> need fine grained control over sending a sub-set of the non-RAM
> vmstate. Not sure if that matters in the end.
>
>
> Regards,
> Daniel

Re: [PATCH] linux-user: Remove stale "not threadsafe" comments

On Fri, 14 Jan 2022 at 15:50, Peter Maydell  wrote:
>
> In linux-user/signal.c we have two FIXME comments claiming that
> parts of the signal-handling code are not threadsafe. These are
> very old, as they were first introduced in commit 624f7979058
> in 2008. Since then we've radically overhauled the signal-handling
> logic, while carefully preserving these FIXME comments.

Oops, I meant to send this as RFC, not PATCH -- I hit ^C
on the send but obviously not quite in time to stop it getting
out of the door. Ignore this one, I've sent it again with
the right tag.

thanks
-- PMM

Re: [PATCH 3/3] linux-user: Return void from queue_signal()


On 14/1/22 16:37, Peter Maydell wrote:

The linux-user queue_signal() function always returns 1, and none of
its callers check the return value.  Give it a void return type
instead.

The return value is a leftover from the old pre-2016 linux-user
signal handling code, which really did have a queue of signals and so
might return a failure indication if too many signals were queued at
once.  The current design avoids having to ever have more than one
signal queued via queue_signal() at once, so it can never fail.

Signed-off-by: Peter Maydell 
---
  linux-user/signal-common.h | 4 ++--
  linux-user/signal.c| 5 ++---
  2 files changed, 4 insertions(+), 5 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PULL 00/20] Kraxel 20220114 patches

On Fri, 14 Jan 2022 at 06:54, Gerd Hoffmann  wrote:
>
> The following changes since commit 91f5f7a5df1fda8c34677a7c49ee8a4bb5b56a36:
>
>   Merge remote-tracking branch 
> 'remotes/lvivier-gitlab/tags/linux-user-for-7.0-pull-request' into staging 
> (2022-01-12 11:51:47 +)
>
> are available in the Git repository at:
>
>   git://git.kraxel.org/qemu tags/kraxel-20220114-pull-request
>
> for you to fetch changes up to 17f6315ef883a142b6a41a491b63a6554e784a5c:
>
>   ui/input-legacy: pass horizontal scroll information (2022-01-13 15:33:18 
> +0100)
>
> 
> - bugfixes for ui, usb, audio, display
> - change default display resolution
> - add horizontal scrolling support
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0
for any user-visible changes.

-- PMM

Re: [PATCH 1/2] accel/tcg: Optimize jump cache flush during tlb range flush

Idan Horowitz  writes:

> When the length of the range is large enough, clearing the whole cache is
> faster than iterating over the (possibly extremely large) set of pages
> contained in the range.
>
> This mimics the pre-existing similar optimization done on the flush of the
> tlb itself.
>
> Signed-off-by: Idan Horowitz 

For multi-patch series please include a cover letter which is the parent
of all the patches. This is the default for git-send-email.

The code itself looks fine but what sort of improvements are we talking
about here? What measurements have you taken and under what conditions?

-- 
Alex Bennée

[PATCH 3/3] linux-user: Return void from queue_signal()

The linux-user queue_signal() function always returns 1, and none of
its callers check the return value.  Give it a void return type
instead.

The return value is a leftover from the old pre-2016 linux-user
signal handling code, which really did have a queue of signals and so
might return a failure indication if too many signals were queued at
once.  The current design avoids having to ever have more than one
signal queued via queue_signal() at once, so it can never fail.

Signed-off-by: Peter Maydell 
---
 linux-user/signal-common.h | 4 ++--
 linux-user/signal.c| 5 ++---
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/linux-user/signal-common.h b/linux-user/signal-common.h
index 42aa479080b..2113165a758 100644
--- a/linux-user/signal-common.h
+++ b/linux-user/signal-common.h
@@ -59,8 +59,8 @@ void setup_rt_frame(int sig, struct target_sigaction *ka,
 
 void process_pending_signals(CPUArchState *cpu_env);
 void signal_init(void);
-int queue_signal(CPUArchState *env, int sig, int si_type,
- target_siginfo_t *info);
+void queue_signal(CPUArchState *env, int sig, int si_type,
+  target_siginfo_t *info);
 void host_to_target_siginfo(target_siginfo_t *tinfo, const siginfo_t *info);
 void target_to_host_siginfo(siginfo_t *info, const target_siginfo_t *tinfo);
 int target_to_host_signal(int sig);
diff --git a/linux-user/signal.c b/linux-user/signal.c
index bfbbeab9ad2..32854bb3752 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -780,8 +780,8 @@ static void QEMU_NORETURN dump_core_and_abort(int 
target_sig)
 
 /* queue a signal so that it will be send to the virtual CPU as soon
as possible */
-int queue_signal(CPUArchState *env, int sig, int si_type,
- target_siginfo_t *info)
+void queue_signal(CPUArchState *env, int sig, int si_type,
+  target_siginfo_t *info)
 {
 CPUState *cpu = env_cpu(env);
 TaskState *ts = cpu->opaque;
@@ -794,7 +794,6 @@ int queue_signal(CPUArchState *env, int sig, int si_type,
 ts->sync_signal.pending = sig;
 /* signal that a new signal is pending */
 qatomic_set(&ts->signal_pending, 1);
-return 1; /* indicates that the signal was queued */
 }
 
 
-- 
2.25.1

[RFC] linux-user: Remove stale "not threadsafe" comments

In linux-user/signal.c we have two FIXME comments claiming that
parts of the signal-handling code are not threadsafe. These are
very old, as they were first introduced in commit 624f7979058
in 2008. Since then we've radically overhauled the signal-handling
logic, while carefully preserving these FIXME comments.

It's unclear exactly what thread-safety issue the original
author was trying to point out -- the relevant data structures
are in the TaskStruct, which makes them per-thread and only
operated on by that thread. The old code at the time of that
commit did have various races involving signal handlers being
invoked at awkward times; possibly this was what was meant.

Delete these FIXME comments:
 * they were written at a time when the way we handled
   signals was completely different
 * the code today appears to us to not have thread-safety issues
 * nobody knows what the problem the comments were trying to
   point out was
so they are serving no useful purpose for us today.

Signed-off-by: Peter Maydell 
---
Marked "RFC" because I'm a bit uneasy with deleting FIXMEs
simply because I can't personally figure out why they're
there. This patch is more to start a discussion to see
if anybody does understand the issue -- in which case we
can instead augment the comments to describe it.
---
 linux-user/signal.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index 32854bb3752..e7410776e21 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -1001,7 +1001,6 @@ int do_sigaction(int sig, const struct target_sigaction 
*act,
 oact->sa_mask = k->sa_mask;
 }
 if (act) {
-/* FIXME: This is not threadsafe.  */
 __get_user(k->_sa_handler, &act->_sa_handler);
 __get_user(k->sa_flags, &act->sa_flags);
 #ifdef TARGET_ARCH_HAS_SA_RESTORER
@@ -1151,7 +1150,6 @@ void process_pending_signals(CPUArchState *cpu_env)
 sigset_t *blocked_set;
 
 while (qatomic_read(&ts->signal_pending)) {
-/* FIXME: This is not threadsafe.  */
 sigfillset(&set);
 sigprocmask(SIG_SETMASK, &set, 0);
 
-- 
2.25.1

[PATCH 1/3] linux-user: Remove unnecessary 'aligned' attribute from TaskState

The linux-user struct TaskState has an 'aligned(16)' attribute.  When
the struct was first added in commit 851e67a1b46f in 2003, there was
a justification in a comment (still present in the source today):

/* NOTE: we force a big alignment so that the stack stored after is
   aligned too */

because the final field in the struct was "uint8_t stack[0];"
But that field was removed in commit 48e15fc2d in 2010 which
switched us to allocating the stack and the TaskState separately.
Because we allocate the structure with g_new0() rather than as
a local variable, the attribute made no difference to the alignment
of the structure anyway.

Remove the unnecessary attribute, and the corresponding comment.

Signed-off-by: Peter Maydell 
---
 linux-user/qemu.h | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 5c713fa8ab2..bd0559759ae 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -96,10 +96,6 @@ struct emulated_sigtable {
 target_siginfo_t info;
 };
 
-/*
- * NOTE: we force a big alignment so that the stack stored after is
- * aligned too
- */
 typedef struct TaskState {
 pid_t ts_tid; /* tid (or pid) of this task */
 #ifdef TARGET_ARM
@@ -160,7 +156,7 @@ typedef struct TaskState {
 
 /* This thread's sigaltstack, if it has one */
 struct target_sigaltstack sigaltstack_used;
-} __attribute__((aligned(16))) TaskState;
+} TaskState;
 
 abi_long do_brk(abi_ulong new_brk);
 
-- 
2.25.1

[PATCH] linux-user: Remove stale "not threadsafe" comments

In linux-user/signal.c we have two FIXME comments claiming that
parts of the signal-handling code are not threadsafe. These are
very old, as they were first introduced in commit 624f7979058
in 2008. Since then we've radically overhauled the signal-handling
logic, while carefully preserving these FIXME comments.

It's unclear exactly what thread-safety issue the original
author was trying to point out -- the relevant data structures
are in the TaskStruct, which makes them per-thread and only
operated on by that thread. The old code at the time of that
commit did have various races involving signal handlers being
invoked at awkward times; possibly this was what was meant.

Delete these FIXME comments:
 * they were written at a time when the way we handled
   signals was completely different
 * the code today appears to us to not have thread-safety issues
 * nobody knows what the problem the comments were trying to
   point out was
so they are serving no useful purpose for us today.

Signed-off-by: Peter Maydell 
---
Marked "RFC" because I'm a bit uneasy with deleting FIXMEs
simply because I can't personally figure out why they're
there. This patch is more to start a discussion to see
if anybody does understand the issue -- in which case we
can instead augment the comments to describe it.
---
 linux-user/signal.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index 32854bb3752..e7410776e21 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -1001,7 +1001,6 @@ int do_sigaction(int sig, const struct target_sigaction 
*act,
 oact->sa_mask = k->sa_mask;
 }
 if (act) {
-/* FIXME: This is not threadsafe.  */
 __get_user(k->_sa_handler, &act->_sa_handler);
 __get_user(k->sa_flags, &act->sa_flags);
 #ifdef TARGET_ARCH_HAS_SA_RESTORER
@@ -1151,7 +1150,6 @@ void process_pending_signals(CPUArchState *cpu_env)
 sigset_t *blocked_set;
 
 while (qatomic_read(&ts->signal_pending)) {
-/* FIXME: This is not threadsafe.  */
 sigfillset(&set);
 sigprocmask(SIG_SETMASK, &set, 0);
 
-- 
2.25.1

Re: [PATCH v2 2/3] migration: Add canary to VMSTATE_END_OF_LIST


On 14/1/22 12:32, Philippe Mathieu-Daudé wrote:

On 13/1/22 20:44, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

We fairly regularly forget VMSTATE_END_OF_LIST markers off descriptions;
given that the current check is only for ->name being NULL, sometimes
we get unlucky and the code apparently works and no one spots the error.

Explicitly add a flag, VMS_END that should be set, and assert it is
set during the traversal.

Note: This can't go in until we update the copy of vmstate.h in slirp.


Do we need a libslirp buildsys version check to get this patch merged?


In that case we should use an intermediate function which would
eventually call assert() after checking SLIRP_MAJOR_VERSION/...
values.


Reviewed-by: Philippe Mathieu-Daudé 


Suggested-by: Peter Maydell 
Signed-off-by: Dr. David Alan Gilbert 
---
  include/migration/vmstate.h | 7 ++-
  migration/savevm.c  | 1 +
  migration/vmstate.c | 2 ++
  3 files changed, 9 insertions(+), 1 deletion(-)

[PATCH v6 00/12] Xilinx Versal's PMC SLCR and OSPI support

Hi,

This series attempts to add support for Xilinx Versal's PMC SLCR
(system-level control registers) and OSPI flash memory controller to
Xilinx Versal virt machine.

The series start with adding a model of Versal's PMC SLCR and connecting
the model to the Versal virt machine. The series then adds a couple of
headers into the xlnx_csu_dma.h needed for building and reusing it later
with the OSPI. The series thereafter introduces a DMA control interface
and implements the interface in the xlnx_csu_dma for being able to reuse
and control the DMA with the OSPI controller. Thereafter a model of
Versal's OSPI controller is added and connected to the Versal virt
machine. The series then ends with adding initial support for the Micron
Xccelera mt35xu01g flash and flashes of this type are connected to the
OSPI in the Versal virt machine.

Best regards,
Francisco Iglesias

Changelog:
v5 -> v6:
  * Corrected unimplemented log messages (patch: "hw/arm/xlnx-versal: Connect
Versal's PMC SLCR")
  * Modify dma_ctrl_if_read to return a MemTxResult carrying the result of the
read operation
  * Updated (and corrected) documentation

v4 -> v5
  * Use named GPIOs for "sd-emmc-sel", "qspi-ospi-mux-sel", "ospi-mux-sel"
in the PMC SLCR model
  * Add a QEMU interface comment for the PMC SLCR model.
  * Switch to use OBJECT_DECLARE_SIMPLE_TYPE in both "xlnx-versal-ospi.h"
and "xlnx-versal-pmc-iou-slcr.h"
  * Create a new patch 'or'ing the interrupts from the BBRAM and RTC model
  * 'Or' the interrupt from the PMC SLCR with the BBRAM and RTC interrupt
inside 'xlnx-versal.c'
  * Connect other not yet implemented PMC SLCR GPIOs to unimplemented messages
  * Reworked and simplified the DMA control interface by removing the
notifier and refill mechanism
  * Corrected various typos and grammatical errors in the DMA control
interface documentation and comments
  * Updated the DMA control interface documentation to describe the new
simplified implementation
  * Use ldl_le_p and ldq_le_p in the OSPI model (and remove the OSPIRdData
union). Also assert in the locations that we are not overruning the
new bytes buffer.
  * Correct the single_cs function in the OSPI model (both comment and output)
  * Correct a typo in a comment inside ospi_do_indirect_write
(s/boundery/boundary/)
  * Remove an unecesary assert in the OSPI model
  * Add a QEMU interface comment for the OSPI model.
  * Rename the OSPI irq in 'xlnx-versal.c' to include 'orgate' in the name for
clarifying

v3 -> v4
  * Correct indentation (patch: "hw/arm/xlnx-versal: Connect Versal's PMC
SLCR")

  * Rename to include "If" in names related to the DMA control interface
  * In dma-ctrl-if.h:
- Don't include qemu-common.h
- Use DECLARE_CLASS_CHECKERS dma-ctrl.h
  * Add a docs/devel documentation patch for the DMA control interface
  * Improve git messages on the dma-ctrl-if patches


v2 -> v3
  * Correct and also include hw/sysbus.h and hw/register.h into
xlnx_csu_dma.h (patch: "include/hw/dma/xlnx_csu_dma: Add in missing
includes in the header")

v1 -> v2
  * Correct the reset in the PMC SLCR model
  * Create a sub structure for the OSPI in the Versal structure (in patch:
"hw/arm/xlnx-versal: Connect the OSPI flash memory controller model")
  * Change to use 'drive_get' instead of 'drive_get_next' (in patch:
"hw/arm/xlnx-versal-virt: Connect mt35xu01g flashes to the OSPI")
  * Add a maintainers patch and list myself as maintainer for the OSPI
controller


Francisco Iglesias (12):
  hw/misc: Add a model of Versal's PMC SLCR
  hw/arm/xlnx-versal: 'Or' the interrupts from the BBRAM and RTC models
  hw/arm/xlnx-versal: Connect Versal's PMC SLCR
  include/hw/dma/xlnx_csu_dma: Add in missing includes in the header
  hw/dma: Add the DMA control interface
  hw/dma/xlnx_csu_dma: Implement the DMA control interface
  hw/ssi: Add a model of Xilinx Versal's OSPI flash memory controller
  hw/arm/xlnx-versal: Connect the OSPI flash memory controller model
  hw/block/m25p80: Add support for Micron Xccela flash mt35xu01g
  hw/arm/xlnx-versal-virt: Connect mt35xu01g flashes to the OSPI
  MAINTAINERS: Add an entry for Xilinx Versal OSPI
  docs/devel: Add documentation for the DMA control interface

 MAINTAINERS|7 +
 docs/devel/dma-ctrl-if.rst |  243 
 docs/devel/index.rst   |1 +
 hw/arm/xlnx-versal-virt.c  |   25 +-
 hw/arm/xlnx-versal.c   |  190 ++-
 hw/block/m25p80.c  |2 +
 hw/dma/dma-ctrl-if.c   |   30 +
 hw/dma/meson.build |1 +
 hw/dma/xlnx_csu_dma.c  |   20 +
 hw/misc/meson.build|5 +-
 hw/misc/xlnx-versal-pmc-iou-slcr.c | 1446 ++
 hw/ssi/meson.build |1 +
 hw/ssi/xlnx-versal-ospi.c  | 1856 ++

[PATCH v6 12/12] docs/devel: Add documentation for the DMA control interface

Also, since being the author, list myself as maintainer for the file.

Signed-off-by: Francisco Iglesias 
---
 MAINTAINERS|   1 +
 docs/devel/dma-ctrl-if.rst | 243 +
 docs/devel/index.rst   |   1 +
 3 files changed, 245 insertions(+)
 create mode 100644 docs/devel/dma-ctrl-if.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index 0e31569d65..5736ce0675 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -968,6 +968,7 @@ M: Francisco Iglesias 
 S: Maintained
 F: hw/ssi/xlnx-versal-ospi.c
 F: include/hw/ssi/xlnx-versal-ospi.h
+F: docs/devel/dma-ctrl-if.rst
 
 ARM ACPI Subsystem
 M: Shannon Zhao 
diff --git a/docs/devel/dma-ctrl-if.rst b/docs/devel/dma-ctrl-if.rst
new file mode 100644
index 00..15ef6491cd
--- /dev/null
+++ b/docs/devel/dma-ctrl-if.rst
@@ -0,0 +1,243 @@
+DMA control interface
+=
+
+About the DMA control interface
+---
+
+DMA engines embedded in peripherals can end up being controlled in
+different ways on real hardware. One possible way is to allow software
+drivers to access the DMA engine's register API and to allow the drivers
+to configure and control DMA transfers through the API. A model of a DMA
+engine in QEMU that is embedded and (re)used in this manner does not need
+to implement the DMA control interface.
+
+Another option on real hardware is to allow the peripheral embedding the
+DMA engine to control the engine through a custom hardware DMA control
+interface between the two. Software drivers in this scenario configure and
+trigger DMA operations through the controlling peripheral's register API
+(for example, writing a specific bit in a register could propagate down to
+a transfer start signal on the DMA control interface). At the same time
+the status, result and interrupts for the transfer might still be intended
+to be read and caught through the DMA engine's register API (and
+signals).
+
+::
+
+Hardware example
+   ++
+   ||
+   | Peripheral |
+   ||
+   ++
+/\
+||   DMA control IF (custom)
+\/
+   ++
+   | Peripheral |
+   |DMA |
+   ++
+
+Figure 1. A peripheral controlling its embedded DMA engine through a
+custom DMA control interface
+
+The above scenario can be modelled in QEMU by implementing this DMA control
+interface in the DMA engine model. This will allow a peripheral embedding
+the DMA engine to initiate DMA transfers through the engine using the
+interface. At the same time the status, result and interrupts for the
+transfer can be read and caught through the DMA engine's register API and
+signals. An example implementation and usage of the DMA control interface
+can be found in the Xilinx CSU DMA model and Xilinx Versal's OSPI model.
+
+::
+
+Memory address
+(register API)
+  0xf101   +-+
+   | |
+   | Versal  |
+   |  OSPI   |
+   | |
+   +-+
+   /\
+   ||  DMA control IF
+   \/
+  0xf1011000   +-+
+   | |
+   | CSU DMA |
+   |  (src)  |
+   | |
+   +-+
+
+Figure 2. Xilinx Versal's OSPI controls and initiates transfers on its
+CSU source DMA through a DMA control interface
+
+DMA control interface files
+---
+
+``include/hw/dma/dma-ctrl-if.h``
+``hw/dma/dma-ctrl-if.c``
+
+DmaCtrlIfClass
+--
+
+The ``DmaCtrlIfClass`` contains the interface methods that can be
+implemented by a DMA engine.
+
+.. code-block:: c
+
+typedef struct DmaCtrlIfClass {
+InterfaceClass parent;
+
+/*
+ * read: Start a read transfer on the DMA engine implementing the DMA
+ * control interface
+ *
+ * @dma_ctrl: the DMA engine implementing this interface
+ * @addr: the address to read
+ * @len: the number of bytes to read at 'addr'
+ *
+ * @return a MemTxResult indicating whether the operation succeeded 
('len'
+ * bytes were read) or failed.
+ */
+MemTxResult (*read)(DmaCtrlIf *dma, hwaddr addr, uint32_t len);
+} DmaCtrlIfClass;
+
+
+dma_ctrl_if_read
+
+
+The ``dma_ctrl_if_read`` function is used from a model embedding the DMA engine
+for starting DMA read transfers.
+
+.. code-block:: c
+
+/*
+ * Start a read transfer on a DMA engine implementing the DMA control
+ * interface.
+ *
+ * @dma_ctrl: the DMA engine implementing this interface
+ * @addr: the address to read
+ * @len: the number

Re: [PATCH 2/3] linux-user: Rename user_force_sig tracepoint to match function name


On 14/1/22 16:37, Peter Maydell wrote:

In commit c599d4d6d6e9bfdb64 in 2016 we renamed the old force_sig()
function to dump_core_and_abort(), but we forgot to rename the
associated tracepoint.  Rename the tracepoint to to match the
function it's called from.

Signed-off-by: Peter Maydell 
---
  linux-user/signal.c | 2 +-
  linux-user/trace-events | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

[PATCH v6 10/12] hw/arm/xlnx-versal-virt: Connect mt35xu01g flashes to the OSPI

Connect Micron Xccela mt35xu01g flashes to the OSPI flash memory
controller.

Signed-off-by: Francisco Iglesias 
Reviewed-by: Edgar E. Iglesias 
Reviewed-by: Peter Maydell 
---
 hw/arm/xlnx-versal-virt.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index 8ea9979710..3f56ae28ee 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -25,6 +25,8 @@
 #define TYPE_XLNX_VERSAL_VIRT_MACHINE MACHINE_TYPE_NAME("xlnx-versal-virt")
 OBJECT_DECLARE_SIMPLE_TYPE(VersalVirt, XLNX_VERSAL_VIRT_MACHINE)
 
+#define XLNX_VERSAL_NUM_OSPI_FLASH 4
+
 struct VersalVirt {
 MachineState parent_obj;
 
@@ -691,6 +693,27 @@ static void versal_virt_init(MachineState *machine)
 exit(EXIT_FAILURE);
 }
 }
+
+for (i = 0; i < XLNX_VERSAL_NUM_OSPI_FLASH; i++) {
+BusState *spi_bus;
+DeviceState *flash_dev;
+qemu_irq cs_line;
+DriveInfo *dinfo = drive_get(IF_MTD, 0, i);
+
+spi_bus = qdev_get_child_bus(DEVICE(&s->soc.pmc.iou.ospi), "spi0");
+
+flash_dev = qdev_new("mt35xu01g");
+if (dinfo) {
+qdev_prop_set_drive_err(flash_dev, "drive",
+blk_by_legacy_dinfo(dinfo), &error_fatal);
+}
+qdev_realize_and_unref(flash_dev, spi_bus, &error_fatal);
+
+cs_line = qdev_get_gpio_in_named(flash_dev, SSI_GPIO_CS, 0);
+
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->soc.pmc.iou.ospi),
+   i + 1, cs_line);
+}
 }
 
 static void versal_virt_machine_instance_init(Object *obj)
-- 
2.11.0

[PATCH v6 08/12] hw/arm/xlnx-versal: Connect the OSPI flash memory controller model

Connect the OSPI flash memory controller model (including the source and
destination DMA).

Signed-off-by: Francisco Iglesias 
Reviewed-by: Peter Maydell 
---
 hw/arm/xlnx-versal.c | 93 
 include/hw/arm/xlnx-versal.h | 20 ++
 2 files changed, 113 insertions(+)

diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index c8c0c102c7..ab58bebfd2 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -28,6 +28,7 @@
 #define GEM_REVISION0x40070106
 
 #define VERSAL_NUM_PMC_APB_IRQS 3
+#define NUM_OSPI_IRQ_LINES 3
 
 static void versal_create_apu_cpus(Versal *s)
 {
@@ -412,6 +413,97 @@ static void versal_create_pmc_iou_slcr(Versal *s, qemu_irq 
*pic)
qdev_get_gpio_in(DEVICE(&s->pmc.apb_irq_orgate), 2));
 }
 
+static void versal_create_ospi(Versal *s, qemu_irq *pic)
+{
+SysBusDevice *sbd;
+MemoryRegion *mr_dac;
+qemu_irq ospi_mux_sel;
+DeviceState *orgate;
+
+memory_region_init(&s->pmc.iou.ospi.linear_mr, OBJECT(s),
+   "versal-ospi-linear-mr" , MM_PMC_OSPI_DAC_SIZE);
+
+object_initialize_child(OBJECT(s), "versal-ospi", &s->pmc.iou.ospi.ospi,
+TYPE_XILINX_VERSAL_OSPI);
+
+mr_dac = sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->pmc.iou.ospi.ospi), 1);
+memory_region_add_subregion(&s->pmc.iou.ospi.linear_mr, 0x0, mr_dac);
+
+/* Create the OSPI destination DMA */
+object_initialize_child(OBJECT(s), "versal-ospi-dma-dst",
+&s->pmc.iou.ospi.dma_dst,
+TYPE_XLNX_CSU_DMA);
+
+object_property_set_link(OBJECT(&s->pmc.iou.ospi.dma_dst),
+"dma", OBJECT(get_system_memory()),
+ &error_abort);
+
+sbd = SYS_BUS_DEVICE(&s->pmc.iou.ospi.dma_dst);
+sysbus_realize(sbd, &error_fatal);
+
+memory_region_add_subregion(&s->mr_ps, MM_PMC_OSPI_DMA_DST,
+sysbus_mmio_get_region(sbd, 0));
+
+/* Create the OSPI source DMA */
+object_initialize_child(OBJECT(s), "versal-ospi-dma-src",
+&s->pmc.iou.ospi.dma_src,
+TYPE_XLNX_CSU_DMA);
+
+object_property_set_bool(OBJECT(&s->pmc.iou.ospi.dma_src), "is-dst",
+ false, &error_abort);
+
+object_property_set_link(OBJECT(&s->pmc.iou.ospi.dma_src),
+"dma", OBJECT(mr_dac), &error_abort);
+
+object_property_set_link(OBJECT(&s->pmc.iou.ospi.dma_src),
+"stream-connected-dma",
+ OBJECT(&s->pmc.iou.ospi.dma_dst),
+ &error_abort);
+
+sbd = SYS_BUS_DEVICE(&s->pmc.iou.ospi.dma_src);
+sysbus_realize(sbd, &error_fatal);
+
+memory_region_add_subregion(&s->mr_ps, MM_PMC_OSPI_DMA_SRC,
+sysbus_mmio_get_region(sbd, 0));
+
+/* Realize the OSPI */
+object_property_set_link(OBJECT(&s->pmc.iou.ospi.ospi), "dma-src",
+ OBJECT(&s->pmc.iou.ospi.dma_src), &error_abort);
+
+sbd = SYS_BUS_DEVICE(&s->pmc.iou.ospi.ospi);
+sysbus_realize(sbd, &error_fatal);
+
+memory_region_add_subregion(&s->mr_ps, MM_PMC_OSPI,
+sysbus_mmio_get_region(sbd, 0));
+
+memory_region_add_subregion(&s->mr_ps, MM_PMC_OSPI_DAC,
+&s->pmc.iou.ospi.linear_mr);
+
+/* ospi_mux_sel */
+ospi_mux_sel = qdev_get_gpio_in_named(DEVICE(&s->pmc.iou.ospi.ospi),
+  "ospi-mux-sel", 0);
+qdev_connect_gpio_out_named(DEVICE(&s->pmc.iou.slcr), "ospi-mux-sel", 0,
+ospi_mux_sel);
+
+/* OSPI irq */
+object_initialize_child(OBJECT(s), "ospi-irq-orgate",
+&s->pmc.iou.ospi.irq_orgate, TYPE_OR_IRQ);
+object_property_set_int(OBJECT(&s->pmc.iou.ospi.irq_orgate),
+"num-lines", NUM_OSPI_IRQ_LINES, &error_fatal);
+
+orgate = DEVICE(&s->pmc.iou.ospi.irq_orgate);
+qdev_realize(orgate, NULL, &error_fatal);
+
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->pmc.iou.ospi.ospi), 0,
+   qdev_get_gpio_in(orgate, 0));
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->pmc.iou.ospi.dma_src), 0,
+   qdev_get_gpio_in(orgate, 1));
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->pmc.iou.ospi.dma_dst), 0,
+   qdev_get_gpio_in(orgate, 2));
+
+qdev_connect_gpio_out(orgate, 0, pic[VERSAL_OSPI_IRQ]);
+}
+
 /* This takes the board allocated linear DDR memory and creates aliases
  * for each split DDR range/aperture on the Versal address map.
  */
@@ -552,6 +644,7 @@ static void versal_realize(DeviceState *dev, Error **errp)
 versal_create_bbram(s, pic);
 versal_create_efuse(s, pic);
 versal_create_pmc_iou_slcr(s, pic);
+versal_create_ospi(s, pic);
 versal_map_ddr(s);
 versal_uni

[PATCH 2/3] linux-user: Rename user_force_sig tracepoint to match function name

In commit c599d4d6d6e9bfdb64 in 2016 we renamed the old force_sig()
function to dump_core_and_abort(), but we forgot to rename the
associated tracepoint.  Rename the tracepoint to to match the
function it's called from.

Signed-off-by: Peter Maydell 
---
 linux-user/signal.c | 2 +-
 linux-user/trace-events | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index f813b4f18e4..bfbbeab9ad2 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -734,7 +734,7 @@ static void QEMU_NORETURN dump_core_and_abort(int 
target_sig)
 struct sigaction act;
 
 host_sig = target_to_host_signal(target_sig);
-trace_user_force_sig(env, target_sig, host_sig);
+trace_user_dump_core_and_abort(env, target_sig, host_sig);
 gdb_signalled(env, target_sig);
 
 /* dump core if supported by target binary format */
diff --git a/linux-user/trace-events b/linux-user/trace-events
index e7d2f54e940..f33717f248a 100644
--- a/linux-user/trace-events
+++ b/linux-user/trace-events
@@ -9,7 +9,7 @@ user_setup_frame(void *env, uint64_t frame_addr) "env=%p 
frame_addr=0x%"PRIx64
 user_setup_rt_frame(void *env, uint64_t frame_addr) "env=%p 
frame_addr=0x%"PRIx64
 user_do_rt_sigreturn(void *env, uint64_t frame_addr) "env=%p 
frame_addr=0x%"PRIx64
 user_do_sigreturn(void *env, uint64_t frame_addr) "env=%p frame_addr=0x%"PRIx64
-user_force_sig(void *env, int target_sig, int host_sig) "env=%p signal %d 
(host %d)"
+user_dump_core_and_abort(void *env, int target_sig, int host_sig) "env=%p 
signal %d (host %d)"
 user_handle_signal(void *env, int target_sig) "env=%p signal %d"
 user_host_signal(void *env, int host_sig, int target_sig) "env=%p signal %d 
(target %d)"
 user_queue_signal(void *env, int target_sig) "env=%p signal %d"
-- 
2.25.1

[PATCH 0/3] linux-user: Fix some minor nits

This patchset fixes up some minor nits in the linux-user code that I
noticed while I was reading code to assist with reviewing the
bsd-user signal handling.

thanks
-- PMM

Peter Maydell (3):
  linux-user: Remove unnecessary 'aligned' attribute from TaskState
  linux-user: Rename user_force_sig tracepoint to match function name
  linux-user: Return void from queue_signal()

 linux-user/qemu.h  | 6 +-
 linux-user/signal-common.h | 4 ++--
 linux-user/signal.c| 7 +++
 linux-user/trace-events| 2 +-
 4 files changed, 7 insertions(+), 12 deletions(-)

-- 
2.25.1

[PATCH v6 07/12] hw/ssi: Add a model of Xilinx Versal's OSPI flash memory controller

Add a model of Xilinx Versal's OSPI flash memory controller.

Signed-off-by: Francisco Iglesias 
---
 hw/ssi/meson.build|1 +
 hw/ssi/xlnx-versal-ospi.c | 1856 +
 include/hw/ssi/xlnx-versal-ospi.h |  111 +++
 3 files changed, 1968 insertions(+)
 create mode 100644 hw/ssi/xlnx-versal-ospi.c
 create mode 100644 include/hw/ssi/xlnx-versal-ospi.h

diff --git a/hw/ssi/meson.build b/hw/ssi/meson.build
index 3d6bc82ab1..0ded9cd092 100644
--- a/hw/ssi/meson.build
+++ b/hw/ssi/meson.build
@@ -7,5 +7,6 @@ softmmu_ss.add(when: 'CONFIG_SSI', if_true: files('ssi.c'))
 softmmu_ss.add(when: 'CONFIG_STM32F2XX_SPI', if_true: files('stm32f2xx_spi.c'))
 softmmu_ss.add(when: 'CONFIG_XILINX_SPI', if_true: files('xilinx_spi.c'))
 softmmu_ss.add(when: 'CONFIG_XILINX_SPIPS', if_true: files('xilinx_spips.c'))
+softmmu_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: 
files('xlnx-versal-ospi.c'))
 softmmu_ss.add(when: 'CONFIG_IMX', if_true: files('imx_spi.c'))
 softmmu_ss.add(when: 'CONFIG_OMAP', if_true: files('omap_spi.c'))
diff --git a/hw/ssi/xlnx-versal-ospi.c b/hw/ssi/xlnx-versal-ospi.c
new file mode 100644
index 00..20f46b3692
--- /dev/null
+++ b/hw/ssi/xlnx-versal-ospi.c
@@ -0,0 +1,1856 @@
+/*
+ * QEMU model of Xilinx Versal's OSPI controller.
+ *
+ * Copyright (c) 2021 Xilinx Inc.
+ * Written by Francisco Iglesias 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "migration/vmstate.h"
+#include "hw/qdev-properties.h"
+#include "qemu/bitops.h"
+#include "qemu/log.h"
+#include "hw/irq.h"
+#include "hw/ssi/xlnx-versal-ospi.h"
+
+#ifndef XILINX_VERSAL_OSPI_ERR_DEBUG
+#define XILINX_VERSAL_OSPI_ERR_DEBUG 0
+#endif
+
+REG32(CONFIG_REG, 0x0)
+FIELD(CONFIG_REG, IDLE_FLD, 31, 1)
+FIELD(CONFIG_REG, DUAL_BYTE_OPCODE_EN_FLD, 30, 1)
+FIELD(CONFIG_REG, CRC_ENABLE_FLD, 29, 1)
+FIELD(CONFIG_REG, CONFIG_RESV2_FLD, 26, 3)
+FIELD(CONFIG_REG, PIPELINE_PHY_FLD, 25, 1)
+FIELD(CONFIG_REG, ENABLE_DTR_PROTOCOL_FLD, 24, 1)
+FIELD(CONFIG_REG, ENABLE_AHB_DECODER_FLD, 23, 1)
+FIELD(CONFIG_REG, MSTR_BAUD_DIV_FLD, 19, 4)
+FIELD(CONFIG_REG, ENTER_XIP_MODE_IMM_FLD, 18, 1)
+FIELD(CONFIG_REG, ENTER_XIP_MODE_FLD, 17, 1)
+FIELD(CONFIG_REG, ENB_AHB_ADDR_REMAP_FLD, 16, 1)
+FIELD(CONFIG_REG, ENB_DMA_IF_FLD, 15, 1)
+FIELD(CONFIG_REG, WR_PROT_FLASH_FLD, 14, 1)
+FIELD(CONFIG_REG, PERIPH_CS_LINES_FLD, 10, 4)
+FIELD(CONFIG_REG, PERIPH_SEL_DEC_FLD, 9, 1)
+FIELD(CONFIG_REG, ENB_LEGACY_IP_MODE_FLD, 8, 1)
+FIELD(CONFIG_REG, ENB_DIR_ACC_CTLR_FLD, 7, 1)
+FIELD(CONFIG_REG, RESET_CFG_FLD, 6, 1)
+FIELD(CONFIG_REG, RESET_PIN_FLD, 5, 1)
+FIELD(CONFIG_REG, HOLD_PIN_FLD, 4, 1)
+FIELD(CONFIG_REG, PHY_MODE_ENABLE_FLD, 3, 1)
+FIELD(CONFIG_REG, SEL_CLK_PHASE_FLD, 2, 1)
+FIELD(CONFIG_REG, SEL_CLK_POL_FLD, 1, 1)
+FIELD(CONFIG_REG, ENB_SPI_FLD, 0, 1)
+REG32(DEV_INSTR_RD_CONFIG_REG, 0x4)
+FIELD(DEV_INSTR_RD_CONFIG_REG, RD_INSTR_RESV5_FLD, 29, 3)
+FIELD(DEV_INSTR_RD_CONFIG_REG, DUMMY_RD_CLK_CYCLES_FLD, 24, 5)
+FIELD(DEV_INSTR_RD_CONFIG_REG, RD_INSTR_RESV4_FLD, 21, 3)
+FIELD(DEV_INSTR_RD_CONFIG_REG, MODE_BIT_ENABLE_FLD, 20, 1)
+FIELD(DEV_INSTR_RD_CONFIG_REG, RD_INSTR_RESV3_FLD, 18, 2)
+FIELD(DEV_INSTR_RD_CONFIG_REG, DATA_XFER_TYPE_EXT_MODE_FLD, 16, 2)
+FIELD(DEV_INSTR_RD_CONFIG_REG, RD_INSTR_RESV2_FLD, 14, 2)
+FIELD(DEV_INSTR_RD_CONFIG_REG, ADDR_XFER_TYPE_STD_MODE_FLD, 12, 2)
+FIELD(DEV_INSTR_RD_CONFIG_REG, PRED_DIS_FLD, 11, 1)
+FIELD(DEV_INSTR_RD_CONFIG_REG, DDR_EN_FLD, 10, 1)
+FIELD(DEV_INSTR_RD_CONFIG_REG, INSTR_TYPE_FLD, 8, 2)
+FIELD(DEV_INSTR_RD_CONFIG_REG, RD_OPCODE_NON_XIP_FLD, 0, 8)
+REG32(DEV_INSTR_WR_CONFIG_REG, 0x8)
+FIELD(DEV_INSTR_WR_CONFIG_REG, WR_INSTR_RESV4_FLD, 29, 3)
+FIELD(DEV_INSTR_WR_CONFIG_REG, DUMMY_WR_CLK_CYCLES_FLD, 24, 5)
+FIELD(DEV_INSTR_WR_CONFIG_REG, WR_IN

[PATCH v6 06/12] hw/dma/xlnx_csu_dma: Implement the DMA control interface

Implement the DMA control interface for allowing direct control of DMA
operations from inside peripheral models embedding (and reusing) the
Xilinx CSU DMA.

Signed-off-by: Francisco Iglesias 
---
 hw/dma/xlnx_csu_dma.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/hw/dma/xlnx_csu_dma.c b/hw/dma/xlnx_csu_dma.c
index 896bb3574d..58860d9f19 100644
--- a/hw/dma/xlnx_csu_dma.c
+++ b/hw/dma/xlnx_csu_dma.c
@@ -30,6 +30,7 @@
 #include "hw/stream.h"
 #include "hw/register.h"
 #include "hw/dma/xlnx_csu_dma.h"
+#include "hw/dma/dma-ctrl-if.h"
 
 /*
  * Ref: UG1087 (v1.7) February 8, 2019
@@ -472,6 +473,21 @@ static uint64_t addr_msb_pre_write(RegisterInfo *reg, 
uint64_t val)
 return val & R_ADDR_MSB_ADDR_MSB_MASK;
 }
 
+static MemTxResult xlnx_csu_dma_dma_ctrl_if_read(DmaCtrlIf *dma, hwaddr addr,
+ uint32_t len)
+{
+XlnxCSUDMA *s = XLNX_CSU_DMA(dma);
+RegisterInfo *reg = &s->regs_info[R_SIZE];
+uint64_t we = MAKE_64BIT_MASK(0, 4 * 8);
+
+s->regs[R_ADDR] = addr;
+s->regs[R_ADDR_MSB] = (uint64_t)addr >> 32;
+
+register_write(reg, len, we, object_get_typename(OBJECT(s)), false);
+
+return (s->regs[R_SIZE] == 0) ? MEMTX_OK : MEMTX_ERROR;
+}
+
 static const RegisterAccessInfo *xlnx_csu_dma_regs_info[] = {
 #define DMACH_REGINFO(NAME, snd)  \
 (const RegisterAccessInfo []) {   \
@@ -696,6 +712,7 @@ static void xlnx_csu_dma_class_init(ObjectClass *klass, 
void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 StreamSinkClass *ssc = STREAM_SINK_CLASS(klass);
+DmaCtrlIfClass *dcic = DMA_CTRL_IF_CLASS(klass);
 
 dc->reset = xlnx_csu_dma_reset;
 dc->realize = xlnx_csu_dma_realize;
@@ -704,6 +721,8 @@ static void xlnx_csu_dma_class_init(ObjectClass *klass, 
void *data)
 
 ssc->push = xlnx_csu_dma_stream_push;
 ssc->can_push = xlnx_csu_dma_stream_can_push;
+
+dcic->read = xlnx_csu_dma_dma_ctrl_if_read;
 }
 
 static void xlnx_csu_dma_init(Object *obj)
@@ -731,6 +750,7 @@ static const TypeInfo xlnx_csu_dma_info = {
 .instance_init = xlnx_csu_dma_init,
 .interfaces = (InterfaceInfo[]) {
 { TYPE_STREAM_SINK },
+{ TYPE_DMA_CTRL_IF },
 { }
 }
 };
-- 
2.11.0

[PATCH v6 11/12] MAINTAINERS: Add an entry for Xilinx Versal OSPI

List myself as maintainer for the Xilinx Versal OSPI controller.

Signed-off-by: Francisco Iglesias 
Reviewed-by: Edgar E. Iglesias 
Reviewed-by: Peter Maydell 
---
 MAINTAINERS | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6ccdec7f02..0e31569d65 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -963,6 +963,12 @@ F: hw/display/dpcd.c
 F: include/hw/display/dpcd.h
 F: docs/system/arm/xlnx-versal-virt.rst
 
+Xilinx Versal OSPI
+M: Francisco Iglesias 
+S: Maintained
+F: hw/ssi/xlnx-versal-ospi.c
+F: include/hw/ssi/xlnx-versal-ospi.h
+
 ARM ACPI Subsystem
 M: Shannon Zhao 
 L: qemu-...@nongnu.org
-- 
2.11.0

[PATCH v6 04/12] include/hw/dma/xlnx_csu_dma: Add in missing includes in the header

Add in the missing includes in the header for being able to build the DMA
model when reusing it.

Signed-off-by: Francisco Iglesias 
Reviewed-by: Peter Maydell 
---
 include/hw/dma/xlnx_csu_dma.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/hw/dma/xlnx_csu_dma.h b/include/hw/dma/xlnx_csu_dma.h
index 9e9dc551e9..28806628b1 100644
--- a/include/hw/dma/xlnx_csu_dma.h
+++ b/include/hw/dma/xlnx_csu_dma.h
@@ -21,6 +21,11 @@
 #ifndef XLNX_CSU_DMA_H
 #define XLNX_CSU_DMA_H
 
+#include "hw/sysbus.h"
+#include "hw/register.h"
+#include "hw/ptimer.h"
+#include "hw/stream.h"
+
 #define TYPE_XLNX_CSU_DMA "xlnx.csu_dma"
 
 #define XLNX_CSU_DMA_R_MAX (0x2c / 4)
-- 
2.11.0

[PATCH v6 09/12] hw/block/m25p80: Add support for Micron Xccela flash mt35xu01g

Add support for Micron Xccela flash mt35xu01g.

Signed-off-by: Francisco Iglesias 
Reviewed-by: Edgar E. Iglesias 
---
 hw/block/m25p80.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index b77503dc84..c6bf3c6bfa 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -255,6 +255,8 @@ static const FlashPartInfo known_devices[] = {
 { INFO("n25q512a",0x20ba20,  0,  64 << 10, 1024, ER_4K) },
 { INFO("n25q512ax3",  0x20ba20,  0x1000,  64 << 10, 1024, ER_4K) },
 { INFO("mt25ql512ab", 0x20ba20, 0x1044, 64 << 10, 1024, ER_4K | ER_32K) },
+{ INFO_STACKED("mt35xu01g", 0x2c5b1b, 0x104100, 128 << 10, 1024,
+   ER_4K | ER_32K, 2) },
 { INFO_STACKED("n25q00",0x20ba21, 0x1000, 64 << 10, 2048, ER_4K, 4) },
 { INFO_STACKED("n25q00a",   0x20bb21, 0x1000, 64 << 10, 2048, ER_4K, 4) },
 { INFO_STACKED("mt25ql01g", 0x20ba21, 0x1040, 64 << 10, 2048, ER_4K, 2) },
-- 
2.11.0

[PATCH v6 05/12] hw/dma: Add the DMA control interface

An option on real hardware when embedding a DMA engine into a peripheral
is to make the peripheral control the engine through a custom DMA control
(hardware) interface between the two. Software drivers in this scenario
configure and trigger DMA operations through the controlling peripheral's
register API (for example, writing a specific bit in a register could
propagate down to a transfer start signal on the DMA control interface).
At the same time the status, results and interrupts for the transfer might
still be intended to be read and caught through the DMA engine's register
API (and signals).

This patch adds a QEMU DMA control interface that can be used for
modelling above scenario. Through this new interface a peripheral model
embedding a DMA engine model will be able to directly initiate transfers
through the DMA. At the same time the transfer state, result and
completion signaling will be read and caught through the DMA engine
model's register API and signaling.

Signed-off-by: Francisco Iglesias 
---
 hw/dma/dma-ctrl-if.c | 30 +++
 hw/dma/meson.build   |  1 +
 include/hw/dma/dma-ctrl-if.h | 58 
 3 files changed, 89 insertions(+)
 create mode 100644 hw/dma/dma-ctrl-if.c
 create mode 100644 include/hw/dma/dma-ctrl-if.h

diff --git a/hw/dma/dma-ctrl-if.c b/hw/dma/dma-ctrl-if.c
new file mode 100644
index 00..895edac277
--- /dev/null
+++ b/hw/dma/dma-ctrl-if.c
@@ -0,0 +1,30 @@
+/*
+ * DMA control interface.
+ *
+ * Copyright (c) 2021 Xilinx Inc.
+ * Written by Francisco Iglesias 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include "qemu/osdep.h"
+#include "exec/hwaddr.h"
+#include "hw/dma/dma-ctrl-if.h"
+
+MemTxResult dma_ctrl_if_read(DmaCtrlIf *dma, hwaddr addr, uint32_t len)
+{
+DmaCtrlIfClass *dcic =  DMA_CTRL_IF_GET_CLASS(dma);
+return dcic->read(dma, addr, len);
+}
+
+static const TypeInfo dma_ctrl_if_info = {
+.name  = TYPE_DMA_CTRL_IF,
+.parent= TYPE_INTERFACE,
+.class_size = sizeof(DmaCtrlIfClass),
+};
+
+static void dma_ctrl_if_register_types(void)
+{
+type_register_static(&dma_ctrl_if_info);
+}
+
+type_init(dma_ctrl_if_register_types)
diff --git a/hw/dma/meson.build b/hw/dma/meson.build
index f3f0661bc3..c43c067856 100644
--- a/hw/dma/meson.build
+++ b/hw/dma/meson.build
@@ -14,3 +14,4 @@ softmmu_ss.add(when: 'CONFIG_PXA2XX', if_true: 
files('pxa2xx_dma.c'))
 softmmu_ss.add(when: 'CONFIG_RASPI', if_true: files('bcm2835_dma.c'))
 softmmu_ss.add(when: 'CONFIG_SIFIVE_PDMA', if_true: files('sifive_pdma.c'))
 softmmu_ss.add(when: 'CONFIG_XLNX_CSU_DMA', if_true: files('xlnx_csu_dma.c'))
+common_ss.add(when: 'CONFIG_XILINX_AXI', if_true: files('dma-ctrl-if.c'))
diff --git a/include/hw/dma/dma-ctrl-if.h b/include/hw/dma/dma-ctrl-if.h
new file mode 100644
index 00..0662149e14
--- /dev/null
+++ b/include/hw/dma/dma-ctrl-if.h
@@ -0,0 +1,58 @@
+/*
+ * DMA control interface.
+ *
+ * Copyright (c) 2021 Xilinx Inc.
+ * Written by Francisco Iglesias 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#ifndef HW_DMA_CTRL_IF_H
+#define HW_DMA_CTRL_IF_H
+
+#include "hw/hw.h"
+#include "exec/memory.h"
+#include "qom/object.h"
+
+#define TYPE_DMA_CTRL_IF "dma-ctrl-if"
+typedef struct DmaCtrlIfClass DmaCtrlIfClass;
+DECLARE_CLASS_CHECKERS(DmaCtrlIfClass, DMA_CTRL_IF,
+   TYPE_DMA_CTRL_IF)
+
+#define DMA_CTRL_IF(obj) \
+ INTERFACE_CHECK(DmaCtrlIf, (obj), TYPE_DMA_CTRL_IF)
+
+typedef struct DmaCtrlIf {
+Object Parent;
+} DmaCtrlIf;
+
+typedef struct DmaCtrlIfClass {
+InterfaceClass parent;
+
+/*
+ * read: Start a read transfer on the DMA engine implementing the DMA
+ * control interface
+ *
+ * @dma_ctrl: the DMA engine implementing this interface
+ * @addr: the address to read
+ * @len: the number of bytes to read at 'addr'
+ *
+ * @return a MemTxResult indicating whether the operation succeeded ('len'
+ * bytes were read) or failed.
+ */
+MemTxResult (*read)(DmaCtrlIf *dma, hwaddr addr, uint32_t len);
+} DmaCtrlIfClass;
+
+/*
+ * Start a read transfer on a DMA engine implementing the DMA control
+ * interface.
+ *
+ * @dma_ctrl: the DMA engine implementing this interface
+ * @addr: the address to read
+ * @len: the number of bytes to read at 'addr'
+ *
+ * @return a MemTxResult indicating whether the operation succeeded ('len'
+ * bytes were read) or failed.
+ */
+MemTxResult dma_ctrl_if_read(DmaCtrlIf *dma, hwaddr addr, uint32_t len);
+
+#endif /* HW_DMA_CTRL_IF_H */
-- 
2.11.0

[PATCH v6 02/12] hw/arm/xlnx-versal: 'Or' the interrupts from the BBRAM and RTC models

Add an orgate and 'or' the interrupts from the BBRAM and RTC models.

Signed-off-by: Francisco Iglesias 
Reviewed-by: Peter Maydell 
---
 hw/arm/xlnx-versal-virt.c|  2 +-
 hw/arm/xlnx-versal.c | 28 ++--
 include/hw/arm/xlnx-versal.h |  5 +++--
 3 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index 0c5edc898e..8ea9979710 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -365,7 +365,7 @@ static void fdt_add_bbram_node(VersalVirt *s)
 qemu_fdt_add_subnode(s->fdt, name);
 
 qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
-   GIC_FDT_IRQ_TYPE_SPI, VERSAL_BBRAM_APB_IRQ_0,
+   GIC_FDT_IRQ_TYPE_SPI, VERSAL_PMC_APB_IRQ,
GIC_FDT_IRQ_FLAGS_LEVEL_HI);
 qemu_fdt_setprop(s->fdt, name, "interrupt-names",
  interrupt_names, sizeof(interrupt_names));
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index b2705b6925..fefd00b57c 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -25,6 +25,8 @@
 #define XLNX_VERSAL_ACPU_TYPE ARM_CPU_TYPE_NAME("cortex-a72")
 #define GEM_REVISION0x40070106
 
+#define VERSAL_NUM_PMC_APB_IRQS 2
+
 static void versal_create_apu_cpus(Versal *s)
 {
 int i;
@@ -260,6 +262,25 @@ static void versal_create_sds(Versal *s, qemu_irq *pic)
 }
 }
 
+static void versal_create_pmc_apb_irq_orgate(Versal *s, qemu_irq *pic)
+{
+DeviceState *orgate;
+
+/*
+ * The VERSAL_PMC_APB_IRQ is an 'or' of the interrupts from the following
+ * models:
+ *  - RTC
+ *  - BBRAM
+ */
+object_initialize_child(OBJECT(s), "pmc-apb-irq-orgate",
+&s->pmc.apb_irq_orgate, TYPE_OR_IRQ);
+orgate = DEVICE(&s->pmc.apb_irq_orgate);
+object_property_set_int(OBJECT(orgate),
+"num-lines", VERSAL_NUM_PMC_APB_IRQS, 
&error_fatal);
+qdev_realize(orgate, NULL, &error_fatal);
+qdev_connect_gpio_out(orgate, 0, pic[VERSAL_PMC_APB_IRQ]);
+}
+
 static void versal_create_rtc(Versal *s, qemu_irq *pic)
 {
 SysBusDevice *sbd;
@@ -277,7 +298,8 @@ static void versal_create_rtc(Versal *s, qemu_irq *pic)
  * TODO: Connect the ALARM and SECONDS interrupts once our RTC model
  * supports them.
  */
-sysbus_connect_irq(sbd, 1, pic[VERSAL_RTC_APB_ERR_IRQ]);
+sysbus_connect_irq(sbd, 1,
+   qdev_get_gpio_in(DEVICE(&s->pmc.apb_irq_orgate), 0));
 }
 
 static void versal_create_xrams(Versal *s, qemu_irq *pic)
@@ -328,7 +350,8 @@ static void versal_create_bbram(Versal *s, qemu_irq *pic)
 sysbus_realize(sbd, &error_fatal);
 memory_region_add_subregion(&s->mr_ps, MM_PMC_BBRAM_CTRL,
 sysbus_mmio_get_region(sbd, 0));
-sysbus_connect_irq(sbd, 0, pic[VERSAL_BBRAM_APB_IRQ_0]);
+sysbus_connect_irq(sbd, 0,
+   qdev_get_gpio_in(DEVICE(&s->pmc.apb_irq_orgate), 1));
 }
 
 static void versal_realize_efuse_part(Versal *s, Object *dev, hwaddr base)
@@ -455,6 +478,7 @@ static void versal_realize(DeviceState *dev, Error **errp)
 versal_create_gems(s, pic);
 versal_create_admas(s, pic);
 versal_create_sds(s, pic);
+versal_create_pmc_apb_irq_orgate(s, pic);
 versal_create_rtc(s, pic);
 versal_create_xrams(s, pic);
 versal_create_bbram(s, pic);
diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
index 895ba12c61..62fb6f0a68 100644
--- a/include/hw/arm/xlnx-versal.h
+++ b/include/hw/arm/xlnx-versal.h
@@ -85,6 +85,8 @@ struct Versal {
 XlnxEFuse efuse;
 XlnxVersalEFuseCtrl efuse_ctrl;
 XlnxVersalEFuseCache efuse_cache;
+
+qemu_or_irq apb_irq_orgate;
 } pmc;
 
 struct {
@@ -111,8 +113,7 @@ struct Versal {
 #define VERSAL_GEM1_WAKE_IRQ_0 59
 #define VERSAL_ADMA_IRQ_0  60
 #define VERSAL_XRAM_IRQ_0  79
-#define VERSAL_BBRAM_APB_IRQ_0 121
-#define VERSAL_RTC_APB_ERR_IRQ 121
+#define VERSAL_PMC_APB_IRQ 121
 #define VERSAL_SD0_IRQ_0   126
 #define VERSAL_EFUSE_IRQ   139
 #define VERSAL_RTC_ALARM_IRQ   142
-- 
2.11.0

[PATCH v6 03/12] hw/arm/xlnx-versal: Connect Versal's PMC SLCR

Connect Versal's PMC SLCR (system-level control registers) model.

Signed-off-by: Francisco Iglesias 
---
 hw/arm/xlnx-versal.c | 71 +++-
 include/hw/arm/xlnx-versal.h |  5 
 2 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index fefd00b57c..c8c0c102c7 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -21,11 +21,13 @@
 #include "kvm_arm.h"
 #include "hw/misc/unimp.h"
 #include "hw/arm/xlnx-versal.h"
+#include "qemu/log.h"
+#include "hw/sysbus.h"
 
 #define XLNX_VERSAL_ACPU_TYPE ARM_CPU_TYPE_NAME("cortex-a72")
 #define GEM_REVISION0x40070106
 
-#define VERSAL_NUM_PMC_APB_IRQS 2
+#define VERSAL_NUM_PMC_APB_IRQS 3
 
 static void versal_create_apu_cpus(Versal *s)
 {
@@ -271,6 +273,7 @@ static void versal_create_pmc_apb_irq_orgate(Versal *s, 
qemu_irq *pic)
  * models:
  *  - RTC
  *  - BBRAM
+ *  - PMC SLCR
  */
 object_initialize_child(OBJECT(s), "pmc-apb-irq-orgate",
 &s->pmc.apb_irq_orgate, TYPE_OR_IRQ);
@@ -392,6 +395,23 @@ static void versal_create_efuse(Versal *s, qemu_irq *pic)
 sysbus_connect_irq(SYS_BUS_DEVICE(ctrl), 0, pic[VERSAL_EFUSE_IRQ]);
 }
 
+static void versal_create_pmc_iou_slcr(Versal *s, qemu_irq *pic)
+{
+SysBusDevice *sbd;
+
+object_initialize_child(OBJECT(s), "versal-pmc-iou-slcr", &s->pmc.iou.slcr,
+TYPE_XILINX_VERSAL_PMC_IOU_SLCR);
+
+sbd = SYS_BUS_DEVICE(&s->pmc.iou.slcr);
+sysbus_realize(sbd, &error_fatal);
+
+memory_region_add_subregion(&s->mr_ps, MM_PMC_PMC_IOU_SLCR,
+sysbus_mmio_get_region(sbd, 0));
+
+sysbus_connect_irq(sbd, 0,
+   qdev_get_gpio_in(DEVICE(&s->pmc.apb_irq_orgate), 2));
+}
+
 /* This takes the board allocated linear DDR memory and creates aliases
  * for each split DDR range/aperture on the Versal address map.
  */
@@ -448,8 +468,31 @@ static void versal_unimp_area(Versal *s, const char *name,
 memory_region_add_subregion(mr, base, mr_dev);
 }
 
+static void versal_unimp_sd_emmc_sel(void *opaque, int n, int level)
+{
+qemu_log_mask(LOG_UNIMP,
+  "Selecting between enabling SD mode or eMMC mode on "
+  "controller %d is not yet implemented\n", n);
+}
+
+static void versal_unimp_qspi_ospi_mux_sel(void *opaque, int n, int level)
+{
+qemu_log_mask(LOG_UNIMP,
+  "Selecting between enabling the QSPI or OSPI linear address "
+  "region is not yet implemented\n");
+}
+
+static void versal_unimp_irq_parity_imr(void *opaque, int n, int level)
+{
+qemu_log_mask(LOG_UNIMP,
+  "PMC SLCR parity interrupt behaviour "
+  "is not yet implemented\n");
+}
+
 static void versal_unimp(Versal *s)
 {
+qemu_irq gpio_in;
+
 versal_unimp_area(s, "psm", &s->mr_ps,
 MM_PSM_START, MM_PSM_END - MM_PSM_START);
 versal_unimp_area(s, "crl", &s->mr_ps,
@@ -464,6 +507,31 @@ static void versal_unimp(Versal *s)
 MM_IOU_SCNTR, MM_IOU_SCNTR_SIZE);
 versal_unimp_area(s, "iou-scntr-seucre", &s->mr_ps,
 MM_IOU_SCNTRS, MM_IOU_SCNTRS_SIZE);
+
+qdev_init_gpio_in_named(DEVICE(s), versal_unimp_sd_emmc_sel,
+"sd-emmc-sel-dummy", 2);
+qdev_init_gpio_in_named(DEVICE(s), versal_unimp_qspi_ospi_mux_sel,
+"qspi-ospi-mux-sel-dummy", 1);
+qdev_init_gpio_in_named(DEVICE(s), versal_unimp_irq_parity_imr,
+"irq-parity-imr-dummy", 1);
+
+gpio_in = qdev_get_gpio_in_named(DEVICE(s), "sd-emmc-sel-dummy", 0);
+qdev_connect_gpio_out_named(DEVICE(&s->pmc.iou.slcr), "sd-emmc-sel", 0,
+gpio_in);
+
+gpio_in = qdev_get_gpio_in_named(DEVICE(s), "sd-emmc-sel-dummy", 1);
+qdev_connect_gpio_out_named(DEVICE(&s->pmc.iou.slcr), "sd-emmc-sel", 1,
+gpio_in);
+
+gpio_in = qdev_get_gpio_in_named(DEVICE(s), "qspi-ospi-mux-sel-dummy", 0);
+qdev_connect_gpio_out_named(DEVICE(&s->pmc.iou.slcr),
+"qspi-ospi-mux-sel", 0,
+gpio_in);
+
+gpio_in = qdev_get_gpio_in_named(DEVICE(s), "irq-parity-imr-dummy", 0);
+qdev_connect_gpio_out_named(DEVICE(&s->pmc.iou.slcr),
+SYSBUS_DEVICE_GPIO_IRQ, 0,
+gpio_in);
 }
 
 static void versal_realize(DeviceState *dev, Error **errp)
@@ -483,6 +551,7 @@ static void versal_realize(DeviceState *dev, Error **errp)
 versal_create_xrams(s, pic);
 versal_create_bbram(s, pic);
 versal_create_efuse(s, pic);
+versal_create_pmc_iou_slcr(s, pic);
 versal_map_ddr(s);
 versal_unimp(s);
 
diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
index 62fb6f0a68..811df73350 100644
--- a/include/hw/arm

[PATCH v6 01/12] hw/misc: Add a model of Versal's PMC SLCR

Add a model of Versal's PMC SLCR (system-level control registers).

Signed-off-by: Francisco Iglesias 
Signed-off-by: Edgar E. Iglesias 
Reviewed-by: Peter Maydell 
---
 hw/misc/meson.build|5 +-
 hw/misc/xlnx-versal-pmc-iou-slcr.c | 1446 
 include/hw/misc/xlnx-versal-pmc-iou-slcr.h |   78 ++
 3 files changed, 1528 insertions(+), 1 deletion(-)
 create mode 100644 hw/misc/xlnx-versal-pmc-iou-slcr.c
 create mode 100644 include/hw/misc/xlnx-versal-pmc-iou-slcr.h

diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index 3f41a3a5b2..e82628a618 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -84,7 +84,10 @@ softmmu_ss.add(when: 'CONFIG_RASPI', if_true: files(
 ))
 softmmu_ss.add(when: 'CONFIG_SLAVIO', if_true: files('slavio_misc.c'))
 softmmu_ss.add(when: 'CONFIG_ZYNQ', if_true: files('zynq_slcr.c'))
-softmmu_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: 
files('xlnx-versal-xramc.c'))
+softmmu_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: files(
+  'xlnx-versal-xramc.c',
+  'xlnx-versal-pmc-iou-slcr.c',
+))
 softmmu_ss.add(when: 'CONFIG_STM32F2XX_SYSCFG', if_true: 
files('stm32f2xx_syscfg.c'))
 softmmu_ss.add(when: 'CONFIG_STM32F4XX_SYSCFG', if_true: 
files('stm32f4xx_syscfg.c'))
 softmmu_ss.add(when: 'CONFIG_STM32F4XX_EXTI', if_true: 
files('stm32f4xx_exti.c'))
diff --git a/hw/misc/xlnx-versal-pmc-iou-slcr.c 
b/hw/misc/xlnx-versal-pmc-iou-slcr.c
new file mode 100644
index 00..07b7ebc217
--- /dev/null
+++ b/hw/misc/xlnx-versal-pmc-iou-slcr.c
@@ -0,0 +1,1446 @@
+/*
+ * QEMU model of Versal's PMC IOU SLCR (system level control registers)
+ *
+ * Copyright (c) 2021 Xilinx Inc.
+ * Written by Edgar E. Iglesias 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/register.h"
+#include "hw/irq.h"
+#include "qemu/bitops.h"
+#include "qemu/log.h"
+#include "migration/vmstate.h"
+#include "hw/qdev-properties.h"
+#include "hw/misc/xlnx-versal-pmc-iou-slcr.h"
+
+#ifndef XILINX_VERSAL_PMC_IOU_SLCR_ERR_DEBUG
+#define XILINX_VERSAL_PMC_IOU_SLCR_ERR_DEBUG 0
+#endif
+
+REG32(MIO_PIN_0, 0x0)
+FIELD(MIO_PIN_0, L3_SEL, 7, 3)
+FIELD(MIO_PIN_0, L2_SEL, 5, 2)
+FIELD(MIO_PIN_0, L1_SEL, 3, 2)
+FIELD(MIO_PIN_0, L0_SEL, 1, 2)
+REG32(MIO_PIN_1, 0x4)
+FIELD(MIO_PIN_1, L3_SEL, 7, 3)
+FIELD(MIO_PIN_1, L2_SEL, 5, 2)
+FIELD(MIO_PIN_1, L1_SEL, 3, 2)
+FIELD(MIO_PIN_1, L0_SEL, 1, 2)
+REG32(MIO_PIN_2, 0x8)
+FIELD(MIO_PIN_2, L3_SEL, 7, 3)
+FIELD(MIO_PIN_2, L2_SEL, 5, 2)
+FIELD(MIO_PIN_2, L1_SEL, 3, 2)
+FIELD(MIO_PIN_2, L0_SEL, 1, 2)
+REG32(MIO_PIN_3, 0xc)
+FIELD(MIO_PIN_3, L3_SEL, 7, 3)
+FIELD(MIO_PIN_3, L2_SEL, 5, 2)
+FIELD(MIO_PIN_3, L1_SEL, 3, 2)
+FIELD(MIO_PIN_3, L0_SEL, 1, 2)
+REG32(MIO_PIN_4, 0x10)
+FIELD(MIO_PIN_4, L3_SEL, 7, 3)
+FIELD(MIO_PIN_4, L2_SEL, 5, 2)
+FIELD(MIO_PIN_4, L1_SEL, 3, 2)
+FIELD(MIO_PIN_4, L0_SEL, 1, 2)
+REG32(MIO_PIN_5, 0x14)
+FIELD(MIO_PIN_5, L3_SEL, 7, 3)
+FIELD(MIO_PIN_5, L2_SEL, 5, 2)
+FIELD(MIO_PIN_5, L1_SEL, 3, 2)
+FIELD(MIO_PIN_5, L0_SEL, 1, 2)
+REG32(MIO_PIN_6, 0x18)
+FIELD(MIO_PIN_6, L3_SEL, 7, 3)
+FIELD(MIO_PIN_6, L2_SEL, 5, 2)
+FIELD(MIO_PIN_6, L1_SEL, 3, 2)
+FIELD(MIO_PIN_6, L0_SEL, 1, 2)
+REG32(MIO_PIN_7, 0x1c)
+FIELD(MIO_PIN_7, L3_SEL, 7, 3)
+FIELD(MIO_PIN_7, L2_SEL, 5, 2)
+FIELD(MIO_PIN_7, L1_SEL, 3, 2)
+FIELD(MIO_PIN_7, L0_SEL, 1, 2)
+REG32(MIO_PIN_8, 0x20)
+FIELD(MIO_PIN_8, L3_SEL, 7, 3)
+FIELD(MIO_PIN_8, L2_SEL, 5, 2)
+FIELD(MIO_PIN_8, L1_SEL, 3, 2)
+FIELD(MIO_PIN_8, L0_SEL, 1, 2)
+REG32(MIO_PIN_9, 0x24)
+FIELD(MIO_PIN_9, L3_SEL, 7, 3)
+FIELD(MIO_PIN_9, L2_SEL, 5, 2)
+FIELD(MIO_PIN_9, L1_SEL, 3, 2)
+FIELD(MIO_PIN_9, L0_SEL, 1, 2)
+REG32(MIO_PIN_10, 0x28)
+FIELD(MIO_PIN_10, L3_SEL, 7, 3)
+FIELD(MIO_PIN_10, L2_SEL, 5, 2)
+FIELD(MIO_PIN_10, L1_SEL, 3, 2)
+FIE

Re: [PATCH v5 12/12] docs/devel: Add documentation for the DMA control interface

On [2022 Jan 07] Fri 16:07:17, Peter Maydell wrote:
> On Tue, 14 Dec 2021 at 11:04, Francisco Iglesias
>  wrote:
> >
> > Also, since being the author, list myself as maintainer for the file.
> >
> > Signed-off-by: Francisco Iglesias 
> 
> 
> > +DmaCtrlIfClass
> > +--
> > +
> > +The ``DmaCtrlIfClass`` contains the interface methods that can be
> > +implemented by a DMA engine.
> > +
> > +.. code-block:: c
> > +
> > +typedef struct DmaCtrlIfClass {
> > +InterfaceClass parent;
> > +
> > +/*
> > + * read: Start a read transfer on the DMA engine implementing the 
> > DMA
> > + * control interface
> > + *
> > + * @dma_ctrl: the DMA engine implementing this interface
> > + * @addr: the address to read
> > + * @len: the number of bytes to read at 'addr'
> > + */
> 
> The prototype seems to be missing here.
> 
> > +} DmaCtrlIfClass;
> > +
> > +
> > +dma_ctrl_if_read
> > +
> > +
> > +The ``dma_ctrl_if_read`` function is used from a model embedding the DMA 
> > engine
> > +for starting DMA read transfers.
> > +
> > +.. code-block:: c
> > +
> > +/*
> > + * Start a read transfer on a DMA engine implementing the DMA control
> > + * interface.
> > + *
> > + * @dma_ctrl: the DMA engine implementing this interface
> > + * @addr: the address to read
> > + * @len: the number of bytes to read at 'addr'
> > + */
> > +void dma_ctrl_if_read(DmaCtrlIf *dma, hwaddr addr, uint32_t len);

Hi Peter,

> 
> The method says it "starts" the transfer. How does the thing on the
> end of the DMA control interface find out when the transfer completes,
> or if there were any errors ?

Yes, I can see that above is not clear enough at the moment, I'll attemp to
improve and fix this in v6! I'll also correct the other issues you found in the
series!

Thank you very much for reviewing again!

Best regards,
Francisco

> 
> thanks
> -- PMM

Re: [RFC PATCH] block/file-posix: Remove a deprecation warning on macOS 12


On 14.01.22 15:15, Philippe Mathieu-Daudé wrote:

On 14/1/22 15:09, Hanna Reitz wrote:

On 06.01.22 00:56, Philippe Mathieu-Daudé wrote:

When building on macOS 12 we get:

   ../block/file-posix.c:3335:18: warning: 'IOMasterPort' is 
deprecated: first deprecated in macOS 12.0 [-Wdeprecated-declarations]

   kernResult = IOMasterPort( MACH_PORT_NULL, &masterPort );
    ^~~~
    IOMainPort

Use IOMainPort (define it to IOMasterPort on macOS < 12),
and replace 'master' by 'main' in a variable name.

Signed-off-by: Philippe Mathieu-Daudé 
---
  block/file-posix.c | 13 +
  1 file changed, 9 insertions(+), 4 deletions(-)


I hope the [RFC] tag isn’t directed at me.

Still, I can give my comment, of course.


diff --git a/block/file-posix.c b/block/file-posix.c
index b283093e5b..0dcfce1856 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3324,17 +3324,22 @@ BlockDriver bdrv_file = {
  #if defined(__APPLE__) && defined(__MACH__)
  static kern_return_t GetBSDPath(io_iterator_t mediaIterator, char 
*bsdPath,

  CFIndex maxPathSize, int flags);
+
+#if !defined(MAC_OS_VERSION_12_0)


So AFAIU from my quick rather fruit-less googling, this macro is 
defined (to some version-defining integer) on every macOS version 
starting from 12.0?  (Just confirming because the name could also 
mean it’d be defined only on 12.0.)


Thanks, I posted up to v3 and macOS users helped me, I will post a v4 
soon.


v3: 
https://lore.kernel.org/qemu-devel/20220110131001.614319-1-f4...@amsat.org/


I see.  The MAC_OS_X_VERSION_M{IN,AX}_REQUIRED thing was exactly what I 
didn’t really understand from said googling, but the important thing is 
that you do.  (Something to do with what runtime is actually in use 
rather than what the system can provide?  Well, I’ll just stop asking.)  O:)


Hanna

Re: [PULL 07/16] qapi/block: Restrict vhost-user-blk to CONFIG_VHOST_USER_BLK_SERVER


On 14/1/22 14:52, Kevin Wolf wrote:

From: Philippe Mathieu-Daudé 

When building QEMU with --disable-vhost-user and using introspection,
query-qmp-schema lists vhost-user-blk even though it's not actually
available:

   { "execute": "query-qmp-schema" }
   {
   "return": [
   ...
   {
   "name": "312",
   "members": [
   {
   "name": "nbd"
   },
   {
   "name": "vhost-user-blk"
   }
   ],
   "meta-type": "enum",
   "values": [
   "nbd",
   "vhost-user-blk"
   ]
   },

Restrict vhost-user-blk in BlockExportType when
CONFIG_VHOST_USER_BLK_SERVER is disabled, so it
doesn't end listed by query-qmp-schema.

Fixes: 90fc91d50b7 ("convert vhost-user-blk server to block export API")
Signed-off-by: Philippe Mathieu-Daudé 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20220107105420.395011-4-f4...@amsat.org>
Signed-off-by: Kevin Wolf 
---
  qapi/block-export.json | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/qapi/block-export.json b/qapi/block-export.json
index c1b92ce1c1..f9ce79a974 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -277,7 +277,8 @@
  # Since: 4.2
  ##
  { 'enum': 'BlockExportType',
-  'data': [ 'nbd', 'vhost-user-blk',
+  'data': [ 'nbd',
+{ 'name': 'vhost-user-blk', 'if': 'CONFIG_VHOST_USER_BLK_SERVER' },
  { 'name': 'fuse', 'if': 'CONFIG_FUSE' } ] }


Markus asked to split this line:
https://lore.kernel.org/qemu-devel/87zgny37s8@dusky.pond.sub.org/
I will add a cleanup patch, no need to cancel this PR for that ;)

Re: [PATCH 5/4] tests: acpi: test short OEM_ID/OEM_TABLE_ID values in test_oem_fields()

2022-01-14 Thread Ani Sinha

On Fri, Jan 14, 2022 at 7:57 PM Igor Mammedov  wrote:

> Previous patch [1] added explicit whitespace padding to OEM_ID/OEM_TABLE_ID
> values used in test_oem_fields() testcase to avoid false positive and
> bisection issues when QEMU is switched to \0' padding. As result
> testcase ceased to test values that were shorter than max possible
> length values.
>
> Update testcase to make sure that it's testing shorter IDs like it
> used to before [2].
>
> 1) "tests: acpi: manually pad OEM_ID/OEM_TABLE_ID for  test_oem_fields()
> test"
> 2) 602b458201 ("acpi: Permit OEM ID and OEM table ID fields to be changed")
>
> Signed-off-by: Igor Mammedov 


Reviewed-by: Ani Sinha 



>

Re: [PATCH qemu] spapr: Force 32bit when resetting a core

2022-01-14 Thread Cédric Le Goater

On 1/10/22 03:52, Alexey Kardashevskiy wrote:

On 08/01/2022 00:39, Greg Kurz wrote:

On Fri, 7 Jan 2022 23:19:03 +1100
David Gibson wrote:

On Fri, Jan 07, 2022 at 12:57:47PM +0100, Greg Kurz wrote:

On Fri, 7 Jan 2022 18:24:23 +1100
Alexey Kardashevskiy wrote:

"PowerPC Processor binding to IEEE 1275" says in
"8.2.1. Initial Register Values" that the initial state is defined as
32bit so do it for both SLOF and VOF.

This should not cause behavioral change as SLOF switches to 64bit very
early anyway.

Only one CPU goes through SLOF. What about the other ones, including
hot plugged CPUs ?

Those will be started by the start-cpu RTAS call which has its own
semantics.

Ah indeed, there's code in linux/arch/powerpc/kernel/head_64.S to switch
secondaries to 64bit... but then, as noted by Cedric, ppc_cpu_reset(),
which is called earlier sets MSR_SF but the changelog of commit 8b9f2118ca40
doesn't provide much details on the motivation. Any idea ?

https://patchwork.kernel.org/project/qemu-devel/patch/1458121432-2855-1-git-send-email-lviv...@redhat.com/

this is probably it:

===
Reset is properly defined as an exception (0x100). For exceptions, the
970MP user manual for example says:

4.5 Exception Definitions
When an exception/interrupt is taken, all bits in the MSR are set to
‘0’, with the following exceptions:
• Exceptions always set MSR[SF] to ‘1’.
===

but it looks like the above is about emulation bare metal 970 rather than
pseries VCPU so that quote does not apply to spapr.

Yes, more info here :

https://patchwork.kernel.org/project/qemu-devel/patch/1458121432-2855-1-git-send-email-lviv...@redhat.com/

mac99+970 only boots with a 64bit kernel. 32bit are not supported because
of the use of the rfi instruction which was removed in v2.01. 32bit user
space is supported though.

However I was not able to build a disk with a compatible boot partition
for OpenBIOS. The above support only applies for kernel loaded in memory.
May be Mark knows how to do this ?

Anyhow, I didn't see any regression on PAPR with this patch, TCG or KVM.

Thanks,

Re: [PATCH 2/2] hw/virtio: add vhost-user-gpio-pci boilerplate



Viresh Kumar  writes:

> This allows is to instantiate a vhost-user-gpio device as part of a PCI
> bus. It is mostly boilerplate which looks pretty similar to the
> vhost-user-fs-pci device.
>
> Signed-off-by: Viresh Kumar 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

Re: [RFC PATCH] block/file-posix: Remove a deprecation warning on macOS 12


On 06.01.22 00:56, Philippe Mathieu-Daudé wrote:

When building on macOS 12 we get:

   ../block/file-posix.c:3335:18: warning: 'IOMasterPort' is deprecated: first 
deprecated in macOS 12.0 [-Wdeprecated-declarations]
   kernResult = IOMasterPort( MACH_PORT_NULL, &masterPort );
^~~~
IOMainPort

Use IOMainPort (define it to IOMasterPort on macOS < 12),
and replace 'master' by 'main' in a variable name.

Signed-off-by: Philippe Mathieu-Daudé 
---
  block/file-posix.c | 13 +
  1 file changed, 9 insertions(+), 4 deletions(-)


I hope the [RFC] tag isn’t directed at me.

Still, I can give my comment, of course.


diff --git a/block/file-posix.c b/block/file-posix.c
index b283093e5b..0dcfce1856 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3324,17 +3324,22 @@ BlockDriver bdrv_file = {
  #if defined(__APPLE__) && defined(__MACH__)
  static kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
  CFIndex maxPathSize, int flags);
+
+#if !defined(MAC_OS_VERSION_12_0)


So AFAIU from my quick rather fruit-less googling, this macro is defined 
(to some version-defining integer) on every macOS version starting from 
12.0?  (Just confirming because the name could also mean it’d be defined 
only on 12.0.)



+#define IOMainPort IOMasterPort
+#endif
+
  static char *FindEjectableOpticalMedia(io_iterator_t *mediaIterator)
  {
  kern_return_t kernResult = KERN_FAILURE;
-mach_port_t masterPort;
+mach_port_t mainPort;
  CFMutableDictionaryRef  classesToMatch;
  const char *matching_array[] = {kIODVDMediaClass, kIOCDMediaClass};
  char *mediaType = NULL;
  
-kernResult = IOMasterPort( MACH_PORT_NULL, &masterPort );

+kernResult = IOMainPort(MACH_PORT_NULL, &mainPort);
  if ( KERN_SUCCESS != kernResult ) {
-printf( "IOMasterPort returned %d\n", kernResult );
+printf("IOMainPort returned %d\n", kernResult);
  }
  
  int index;

@@ -3347,7 +3352,7 @@ static char *FindEjectableOpticalMedia(io_iterator_t 
*mediaIterator)
  }
  CFDictionarySetValue(classesToMatch, CFSTR(kIOMediaEjectableKey),
   kCFBooleanTrue);
-kernResult = IOServiceGetMatchingServices(masterPort, classesToMatch,
+kernResult = IOServiceGetMatchingServices(mainPort, classesToMatch,
mediaIterator);
  if (kernResult != KERN_SUCCESS) {
  error_report("Note: IOServiceGetMatchingServices returned %d",


“Looks good to me” ← here’s the comment you requested O:)

Hanna

Re: [PATCH v3 3/3] target/riscv: add support for svpbmt extension

2022-01-14 Thread Weiwei Li




在 2022/1/14 下午9:59, Anup Patel 写道:

On Fri, Jan 14, 2022 at 7:11 AM Weiwei Li  wrote:

It uses two PTE bits, but otherwise has no effect on QEMU, since QEMU is 
sequentially consistent and doesn't model PMAs currently

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Tested-by: Heiko Stuebner 
---
  target/riscv/cpu.c| 1 +
  target/riscv/cpu.h| 1 +
  target/riscv/cpu_bits.h   | 3 +++
  target/riscv/cpu_helper.c | 9 -
  4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 45ac98e06b..4f82bd00a3 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -670,6 +670,7 @@ static Property riscv_cpu_properties[] = {

  DEFINE_PROP_BOOL("svinval", RISCVCPU, cfg.ext_svinval, false),
  DEFINE_PROP_BOOL("svnapot", RISCVCPU, cfg.ext_svnapot, false),
+DEFINE_PROP_BOOL("svpbmt", RISCVCPU, cfg.ext_svpbmt, false),

  DEFINE_PROP_BOOL("zba", RISCVCPU, cfg.ext_zba, true),
  DEFINE_PROP_BOOL("zbb", RISCVCPU, cfg.ext_zbb, true),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index c3d1845ca1..53f314c752 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -329,6 +329,7 @@ struct RISCVCPU {
  bool ext_icsr;
  bool ext_svinval;
  bool ext_svnapot;
+bool ext_svpbmt;
  bool ext_zfh;
  bool ext_zfhmin;

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index bc23e3b523..ee294c1d0b 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -486,7 +486,10 @@ typedef enum {
  #define PTE_A   0x040 /* Accessed */
  #define PTE_D   0x080 /* Dirty */
  #define PTE_SOFT0x300 /* Reserved for Software */
+#define PTE_RSVD0x1FC0 /* Reserved for future use */
+#define PTE_PBMT0x6000 /* Page-based memory types */
  #define PTE_N   0x8000 /* NAPOT translation */
+#define PTE_ATTR0xFFC0 /* All attributes bits */

  /* Page table PPN shift amount */
  #define PTE_PPN_SHIFT   10
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 58ab85bca3..f90766e026 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -619,16 +619,23 @@ restart:
  return TRANSLATE_FAIL;
  }

-hwaddr ppn = (pte & ~(target_ulong)PTE_N) >> PTE_PPN_SHIFT;
+hwaddr ppn = (pte & ~(target_ulong)PTE_ATTR) >> PTE_PPN_SHIFT;

  RISCVCPU *cpu = env_archcpu(env);
  if (!cpu->cfg.ext_svnapot && (pte & PTE_N)) {
  return TRANSLATE_FAIL;
+} else if (!cpu->cfg.ext_svpbmt && (pte & PTE_PBMT)) {
+return TRANSLATE_FAIL;
+} else if (pte & PTE_RSVD) {
+return TRANSLATE_FAIL;
  } else if (!(pte & PTE_V)) {
  /* Invalid PTE */
  return TRANSLATE_FAIL;
  } else if (!(pte & (PTE_R | PTE_W | PTE_X))) {
  /* Inner PTE, continue walking */
+if (pte & (PTE_D | PTE_A | PTE_U | PTE_N | PTE_PBMT)) {
+return TRANSLATE_FAIL;
+}

I think you should add a patch before PATCH1 to add following:

if (pte & (PTE_D | PTE_A | PTE_U)) {
 return TRANSLATE_FAIL;
}

The current PATCH1 should add PTE_N to the comparison and
this patch can add PTE_PBMT to the comparison.

OK. I'll update this.

  base = ppn << PGSHIFT;
  } else if ((pte & (PTE_R | PTE_W | PTE_X)) == PTE_W) {
  /* Reserved leaf PTE flags: PTE_W */
--
2.17.1


Apart from the minor comment above, it looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup


Regards,

Weiwei Li

Re: [PATCH 1/2] hw/virtio: add boilerplate for vhost-user-gpio device

Viresh Kumar  writes:

> This creates the QEMU side of the vhost-user-gpio device which connects
> to the remote daemon. It is based of vhost-user-i2c code.
>
> Signed-off-by: Viresh Kumar 

> +++ b/include/hw/virtio/vhost-user-gpio.h
> @@ -0,0 +1,35 @@
> +/*
> + * Vhost-user GPIO virtio device
> + *
> + * Copyright (c) 2021 Viresh Kumar 
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef _QEMU_VHOST_USER_GPIO_H
> +#define _QEMU_VHOST_USER_GPIO_H
> +
> +#include "hw/virtio/virtio.h"
> +#include "hw/virtio/vhost.h"
> +#include "hw/virtio/vhost-user.h"
> +#include "standard-headers/linux/virtio_gpio.h"

Hmm this fails:

  In file included from ../../hw/virtio/vhost-user-gpio.c:13:
  /home/alex/lsrc/qemu.git/include/hw/virtio/vhost-user-gpio.h:15:10: fatal 
error: standard-headers/linux/virtio_gpio.h: No such file or directory
 15 | #include "standard-headers/linux/virtio_gpio.h"
|  ^~
  compilation terminated.

The usual solution is to create a patch that imports the headers using:

  ./scripts/update-linux-headers.sh

either from the current mainline (or your own tree if the feature is in
flight) and mark the patch clearly as not for merging.

-- 
Alex Bennée

Re: [RFC PATCH] block/file-posix: Remove a deprecation warning on macOS 12


On 14/1/22 15:09, Hanna Reitz wrote:

On 06.01.22 00:56, Philippe Mathieu-Daudé wrote:

When building on macOS 12 we get:

   ../block/file-posix.c:3335:18: warning: 'IOMasterPort' is 
deprecated: first deprecated in macOS 12.0 [-Wdeprecated-declarations]

   kernResult = IOMasterPort( MACH_PORT_NULL, &masterPort );
    ^~~~
    IOMainPort

Use IOMainPort (define it to IOMasterPort on macOS < 12),
and replace 'master' by 'main' in a variable name.

Signed-off-by: Philippe Mathieu-Daudé 
---
  block/file-posix.c | 13 +
  1 file changed, 9 insertions(+), 4 deletions(-)


I hope the [RFC] tag isn’t directed at me.

Still, I can give my comment, of course.


diff --git a/block/file-posix.c b/block/file-posix.c
index b283093e5b..0dcfce1856 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3324,17 +3324,22 @@ BlockDriver bdrv_file = {
  #if defined(__APPLE__) && defined(__MACH__)
  static kern_return_t GetBSDPath(io_iterator_t mediaIterator, char 
*bsdPath,

  CFIndex maxPathSize, int flags);
+
+#if !defined(MAC_OS_VERSION_12_0)


So AFAIU from my quick rather fruit-less googling, this macro is defined 
(to some version-defining integer) on every macOS version starting from 
12.0?  (Just confirming because the name could also mean it’d be defined 
only on 12.0.)


Thanks, I posted up to v3 and macOS users helped me, I will post a v4 soon.

v3: 
https://lore.kernel.org/qemu-devel/20220110131001.614319-1-f4...@amsat.org/



+#define IOMainPort IOMasterPort
+#endif
+
  static char *FindEjectableOpticalMedia(io_iterator_t *mediaIterator)
  {
  kern_return_t kernResult = KERN_FAILURE;
-    mach_port_t masterPort;
+    mach_port_t mainPort;
  CFMutableDictionaryRef  classesToMatch;
  const char *matching_array[] = {kIODVDMediaClass, kIOCDMediaClass};
  char *mediaType = NULL;
-    kernResult = IOMasterPort( MACH_PORT_NULL, &masterPort );
+    kernResult = IOMainPort(MACH_PORT_NULL, &mainPort);
  if ( KERN_SUCCESS != kernResult ) {
-    printf( "IOMasterPort returned %d\n", kernResult );
+    printf("IOMainPort returned %d\n", kernResult);
  }
  int index;
@@ -3347,7 +3352,7 @@ static char 
*FindEjectableOpticalMedia(io_iterator_t *mediaIterator)

  }
  CFDictionarySetValue(classesToMatch, 
CFSTR(kIOMediaEjectableKey),

   kCFBooleanTrue);
-    kernResult = IOServiceGetMatchingServices(masterPort, 
classesToMatch,
+    kernResult = IOServiceGetMatchingServices(mainPort, 
classesToMatch,

    mediaIterator);
  if (kernResult != KERN_SUCCESS) {
  error_report("Note: IOServiceGetMatchingServices 
returned %d",


“Looks good to me” ← here’s the comment you requested O:)


Thanks :)

Re: [PATCH v3 2/3] target/riscv: add support for svinval extension

2022-01-14 Thread Weiwei Li




在 2022/1/14 下午10:01, Anup Patel 写道:

On Fri, Jan 14, 2022 at 7:24 PM Weiwei Li  wrote:

Thanks for your comments.

在 2022/1/14 下午9:40, Anup Patel 写道:

On Fri, Jan 14, 2022 at 7:11 AM Weiwei Li  wrote:

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
  target/riscv/cpu.c  |  1 +
  target/riscv/cpu.h  |  1 +
  target/riscv/insn32.decode  |  7 ++
  target/riscv/insn_trans/trans_svinval.c.inc | 75 +
  target/riscv/translate.c|  1 +
  5 files changed, 85 insertions(+)
  create mode 100644 target/riscv/insn_trans/trans_svinval.c.inc

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index ff6c86c85b..45ac98e06b 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -668,6 +668,7 @@ static Property riscv_cpu_properties[] = {
  DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
  DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),

+DEFINE_PROP_BOOL("svinval", RISCVCPU, cfg.ext_svinval, false),
  DEFINE_PROP_BOOL("svnapot", RISCVCPU, cfg.ext_svnapot, false),

  DEFINE_PROP_BOOL("zba", RISCVCPU, cfg.ext_zba, true),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index d3d17cde82..c3d1845ca1 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -327,6 +327,7 @@ struct RISCVCPU {
  bool ext_counters;
  bool ext_ifencei;
  bool ext_icsr;
+bool ext_svinval;
  bool ext_svnapot;
  bool ext_zfh;
  bool ext_zfhmin;
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5bbedc254c..7a0351fde2 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -809,3 +809,10 @@ fcvt_l_h   1100010  00010 . ... . 1010011 @r2_rm
  fcvt_lu_h  1100010  00011 . ... . 1010011 @r2_rm
  fcvt_h_l   1101010  00010 . ... . 1010011 @r2_rm
  fcvt_h_lu  1101010  00011 . ... . 1010011 @r2_rm
+
+# *** Svinval Standard Extension ***
+sinval_vma0001011 . . 000 0 1110011 @sfence_vma
+sfence_w_inval0001100 0 0 000 0 1110011
+sfence_inval_ir   0001100 1 0 000 0 1110011
+hinval_vvma   0011011 . . 000 0 1110011 @hfence_vvma

s/0011011/0010011/

+hinval_gvma   0111011 . . 000 0 1110011 @hfence_gvma

s/0111011/0110011/

Sorry. I didn't find the encodings for svinval instructions from the spec. So I 
get them from  spike 
(https://github.com/riscv-software-src/riscv-isa-sim/blob/master/riscv/encoding.h)
 which are as follows:

#define MATCH_HINVAL_VVMA 0x3673
#define MASK_HINVAL_VVMA 0xfe007fff
#define MATCH_HINVAL_GVMA 0x7673
#define MASK_HINVAL_GVMA 0xfe007fff
Are they not the latest encodings?

The code in Spike seems to be buggy but that's a separate issue.

Refer, page 138 of
https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220110-eae4f00/riscv-privileged.pdf

Regards,
Anup


OK. Thanks a lot. I'll fix this.

Regards,

Weiwei Li


diff --git a/target/riscv/insn_trans/trans_svinval.c.inc 
b/target/riscv/insn_trans/trans_svinval.c.inc
new file mode 100644
index 00..1dde665661
--- /dev/null
+++ b/target/riscv/insn_trans/trans_svinval.c.inc
@@ -0,0 +1,75 @@
+/*
+ * RISC-V translation routines for the Svinval Standard Instruction Set.
+ *
+ * Copyright (c) 2020-2021 PLCT lab
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#define REQUIRE_SVINVAL(ctx) do {\
+if (!RISCV_CPU(ctx->cs)->cfg.ext_svinval) {  \
+return false;\
+}\
+} while (0)
+
+static bool trans_sinval_vma(DisasContext *ctx, arg_sinval_vma *a)
+{
+REQUIRE_SVINVAL(ctx);
+/* Do the same as sfence.vma currently */
+REQUIRE_EXT(ctx, RVS);
+#ifndef CONFIG_USER_ONLY
+gen_helper_tlb_flush(cpu_env);
+return true;
+#endif
+return false;
+}
+
+static bool trans_sfence_w_inval(DisasContext *ctx, arg_sfence_w_inval *a)
+{
+REQUIRE_SVINVAL(ctx);
+REQUIRE_EXT(ctx, RVS);
+/* Do nothing currently */
+return true;
+}
+
+static bool trans_sfence_inval_ir(DisasContext *ctx, arg_sfence_inval_ir *a)
+{
+REQUIRE_SVINVAL(ctx);
+REQUIRE_EXT(ctx, RVS);
+/* Do nothing currently */
+return true;
+}
+
+static bool trans_hinval_vvma(DisasContext *ctx, arg_hinval_vvma *a)
+{
+REQUIRE_SVINVAL(ct

[PATCH v5 6/6] hw/arm/virt: Drop superfluous checks against highmem

Now that the devices present in the extended memory map are checked
against the available PA space and disabled when they don't fit,
there is no need to keep the same checks against highmem, as
highmem really is a shortcut for the PA space being 32bit.

Reviewed-by: Eric Auger 
Signed-off-by: Marc Zyngier 
---
 hw/arm/virt-acpi-build.c | 2 --
 hw/arm/virt.c| 5 +
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 0757c28f69..449fab0080 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -947,8 +947,6 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables 
*tables)
 acpi_add_table(table_offsets, tables_blob);
 build_fadt_rev5(tables_blob, tables->linker, vms, dsdt);
 
-vms->highmem_redists &= vms->highmem;
-
 acpi_add_table(table_offsets, tables_blob);
 build_madt(tables_blob, tables->linker, vms);
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 053791cc44..4524f3807d 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2171,9 +2171,6 @@ static void machvirt_init(MachineState *machine)
 
 virt_flash_fdt(vms, sysmem, secure_sysmem ?: sysmem);
 
-vms->highmem_mmio &= vms->highmem;
-vms->highmem_redists &= vms->highmem;
-
 create_gic(vms, sysmem);
 
 virt_cpu_post_init(vms, sysmem);
@@ -2192,7 +2189,7 @@ static void machvirt_init(MachineState *machine)
machine->ram_size, "mach-virt.tag");
 }
 
-vms->highmem_ecam &= vms->highmem && (!firmware_loaded || aarch64);
+vms->highmem_ecam &= (!firmware_loaded || aarch64);
 
 create_rtc(vms);
 
-- 
2.30.2

[PATCH v5 1/6] hw/arm/virt: Add a control for the the highmem PCIe MMIO

Just like we can control the enablement of the highmem PCIe ECAM
region using highmem_ecam, let's add a control for the highmem
PCIe MMIO  region.

Similarily to highmem_ecam, this region is disabled when highmem
is off.

Signed-off-by: Marc Zyngier 
---
 hw/arm/virt-acpi-build.c | 10 --
 hw/arm/virt.c|  7 +--
 include/hw/arm/virt.h|  1 +
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f2514ce77c..449fab0080 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -158,10 +158,9 @@ static void acpi_dsdt_add_virtio(Aml *scope,
 }
 
 static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
-  uint32_t irq, bool use_highmem, bool 
highmem_ecam,
-  VirtMachineState *vms)
+  uint32_t irq, VirtMachineState *vms)
 {
-int ecam_id = VIRT_ECAM_ID(highmem_ecam);
+int ecam_id = VIRT_ECAM_ID(vms->highmem_ecam);
 struct GPEXConfig cfg = {
 .mmio32 = memmap[VIRT_PCIE_MMIO],
 .pio= memmap[VIRT_PCIE_PIO],
@@ -170,7 +169,7 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry 
*memmap,
 .bus= vms->bus,
 };
 
-if (use_highmem) {
+if (vms->highmem_mmio) {
 cfg.mmio64 = memmap[VIRT_HIGH_PCIE_MMIO];
 }
 
@@ -869,8 +868,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 acpi_dsdt_add_fw_cfg(scope, &memmap[VIRT_FW_CFG]);
 acpi_dsdt_add_virtio(scope, &memmap[VIRT_MMIO],
 (irqmap[VIRT_MMIO] + ARM_SPI_BASE), NUM_VIRTIO_TRANSPORTS);
-acpi_dsdt_add_pci(scope, memmap, (irqmap[VIRT_PCIE] + ARM_SPI_BASE),
-  vms->highmem, vms->highmem_ecam, vms);
+acpi_dsdt_add_pci(scope, memmap, irqmap[VIRT_PCIE] + ARM_SPI_BASE, vms);
 if (vms->acpi_dev) {
 build_ged_aml(scope, "\\_SB."GED_DEVICE,
   HOTPLUG_HANDLER(vms->acpi_dev),
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b45b52c90e..ed8ea96acc 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1412,7 +1412,7 @@ static void create_pcie(VirtMachineState *vms)
  mmio_reg, base_mmio, size_mmio);
 memory_region_add_subregion(get_system_memory(), base_mmio, mmio_alias);
 
-if (vms->highmem) {
+if (vms->highmem_mmio) {
 /* Map high MMIO space */
 MemoryRegion *high_mmio_alias = g_new0(MemoryRegion, 1);
 
@@ -1466,7 +1466,7 @@ static void create_pcie(VirtMachineState *vms)
 qemu_fdt_setprop_sized_cells(ms->fdt, nodename, "reg",
  2, base_ecam, 2, size_ecam);
 
-if (vms->highmem) {
+if (vms->highmem_mmio) {
 qemu_fdt_setprop_sized_cells(ms->fdt, nodename, "ranges",
  1, FDT_PCI_RANGE_IOPORT, 2, 0,
  2, base_pio, 2, size_pio,
@@ -2105,6 +2105,8 @@ static void machvirt_init(MachineState *machine)
 
 virt_flash_fdt(vms, sysmem, secure_sysmem ?: sysmem);
 
+vms->highmem_mmio &= vms->highmem;
+
 create_gic(vms, sysmem);
 
 virt_cpu_post_init(vms, sysmem);
@@ -2802,6 +2804,7 @@ static void virt_instance_init(Object *obj)
 vms->gic_version = VIRT_GIC_VERSION_NOSEL;
 
 vms->highmem_ecam = !vmc->no_highmem_ecam;
+vms->highmem_mmio = true;
 
 if (vmc->no_its) {
 vms->its = false;
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index dc6b66ffc8..9c54acd10d 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -143,6 +143,7 @@ struct VirtMachineState {
 bool secure;
 bool highmem;
 bool highmem_ecam;
+bool highmem_mmio;
 bool its;
 bool tcg_its;
 bool virt;
-- 
2.30.2

[PATCH v5 2/6] hw/arm/virt: Add a control for the the highmem redistributors

Just like we can control the enablement of the highmem PCIe region
using highmem_ecam, let's add a control for the highmem GICv3
redistributor region.

Similarily to highmem_ecam, these redistributors are disabled when
highmem is off.

Reviewed-by: Andrew Jones 
Signed-off-by: Marc Zyngier 
---
 hw/arm/virt-acpi-build.c | 2 ++
 hw/arm/virt.c| 2 ++
 include/hw/arm/virt.h| 4 +++-
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 449fab0080..0757c28f69 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -947,6 +947,8 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables 
*tables)
 acpi_add_table(table_offsets, tables_blob);
 build_fadt_rev5(tables_blob, tables->linker, vms, dsdt);
 
+vms->highmem_redists &= vms->highmem;
+
 acpi_add_table(table_offsets, tables_blob);
 build_madt(tables_blob, tables->linker, vms);
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ed8ea96acc..e734a75850 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2106,6 +2106,7 @@ static void machvirt_init(MachineState *machine)
 virt_flash_fdt(vms, sysmem, secure_sysmem ?: sysmem);
 
 vms->highmem_mmio &= vms->highmem;
+vms->highmem_redists &= vms->highmem;
 
 create_gic(vms, sysmem);
 
@@ -2805,6 +2806,7 @@ static void virt_instance_init(Object *obj)
 
 vms->highmem_ecam = !vmc->no_highmem_ecam;
 vms->highmem_mmio = true;
+vms->highmem_redists = true;
 
 if (vmc->no_its) {
 vms->its = false;
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 9c54acd10d..dc9fa26faa 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -144,6 +144,7 @@ struct VirtMachineState {
 bool highmem;
 bool highmem_ecam;
 bool highmem_mmio;
+bool highmem_redists;
 bool its;
 bool tcg_its;
 bool virt;
@@ -190,7 +191,8 @@ static inline int 
virt_gicv3_redist_region_count(VirtMachineState *vms)
 
 assert(vms->gic_version == VIRT_GIC_VERSION_3);
 
-return MACHINE(vms)->smp.cpus > redist0_capacity ? 2 : 1;
+return (MACHINE(vms)->smp.cpus > redist0_capacity &&
+vms->highmem_redists) ? 2 : 1;
 }
 
 #endif /* QEMU_ARM_VIRT_H */
-- 
2.30.2

[PATCH v5 3/6] hw/arm/virt: Honor highmem setting when computing the memory map

Even when the VM is configured with highmem=off, the highest_gpa
field includes devices that are above the 4GiB limit.
Similarily, nothing seem to check that the memory is within
the limit set by the highmem=off option.

This leads to failures in virt_kvm_type() on systems that have
a crippled IPA range, as the reported IPA space is larger than
what it should be.

Instead, honor the user-specified limit to only use the devices
at the lowest end of the spectrum, and fail if we have memory
crossing the 4GiB limit.

Reviewed-by: Andrew Jones 
Reviewed-by: Eric Auger 
Signed-off-by: Marc Zyngier 
---
 hw/arm/virt.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index e734a75850..ecc3e3e5b0 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1663,7 +1663,7 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 static void virt_set_memmap(VirtMachineState *vms)
 {
 MachineState *ms = MACHINE(vms);
-hwaddr base, device_memory_base, device_memory_size;
+hwaddr base, device_memory_base, device_memory_size, memtop;
 int i;
 
 vms->memmap = extended_memmap;
@@ -1690,7 +1690,11 @@ static void virt_set_memmap(VirtMachineState *vms)
 device_memory_size = ms->maxram_size - ms->ram_size + ms->ram_slots * GiB;
 
 /* Base address of the high IO region */
-base = device_memory_base + ROUND_UP(device_memory_size, GiB);
+memtop = base = device_memory_base + ROUND_UP(device_memory_size, GiB);
+if (!vms->highmem && memtop > 4 * GiB) {
+error_report("highmem=off, but memory crosses the 4GiB limit\n");
+exit(EXIT_FAILURE);
+}
 if (base < device_memory_base) {
 error_report("maxmem/slots too huge");
 exit(EXIT_FAILURE);
@@ -1707,7 +1711,7 @@ static void virt_set_memmap(VirtMachineState *vms)
 vms->memmap[i].size = size;
 base += size;
 }
-vms->highest_gpa = base - 1;
+vms->highest_gpa = (vms->highmem ? base : memtop) - 1;
 if (device_memory_size > 0) {
 ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
 ms->device_memory->base = device_memory_base;
-- 
2.30.2

[PATCH v5 4/6] hw/arm/virt: Use the PA range to compute the memory map

The highmem attribute is nothing but another way to express the
PA range of a VM. To support HW that has a smaller PA range then
what QEMU assumes, pass this PA range to the virt_set_memmap()
function, allowing it to correctly exclude highmem devices
if they are outside of the PA range.

Signed-off-by: Marc Zyngier 
---
 hw/arm/virt.c | 64 +--
 1 file changed, 52 insertions(+), 12 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ecc3e3e5b0..a427676b50 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1660,7 +1660,7 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 return arm_cpu_mp_affinity(idx, clustersz);
 }
 
-static void virt_set_memmap(VirtMachineState *vms)
+static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
 {
 MachineState *ms = MACHINE(vms);
 hwaddr base, device_memory_base, device_memory_size, memtop;
@@ -1678,6 +1678,14 @@ static void virt_set_memmap(VirtMachineState *vms)
 exit(EXIT_FAILURE);
 }
 
+/*
+ * !highmem is exactly the same as limiting the PA space to 32bit,
+ * irrespective of the underlying capabilities of the HW.
+ */
+if (!vms->highmem) {
+pa_bits = 32;
+}
+
 /*
  * We compute the base of the high IO region depending on the
  * amount of initial and device memory. The device memory start/size
@@ -1691,8 +1699,9 @@ static void virt_set_memmap(VirtMachineState *vms)
 
 /* Base address of the high IO region */
 memtop = base = device_memory_base + ROUND_UP(device_memory_size, GiB);
-if (!vms->highmem && memtop > 4 * GiB) {
-error_report("highmem=off, but memory crosses the 4GiB limit\n");
+if (memtop > BIT_ULL(pa_bits)) {
+   error_report("Addressing limited to %d bits, but memory exceeds it 
by %llu bytes\n",
+pa_bits, memtop - BIT_ULL(pa_bits));
 exit(EXIT_FAILURE);
 }
 if (base < device_memory_base) {
@@ -1711,7 +1720,13 @@ static void virt_set_memmap(VirtMachineState *vms)
 vms->memmap[i].size = size;
 base += size;
 }
-vms->highest_gpa = (vms->highmem ? base : memtop) - 1;
+
+/*
+ * If base fits within pa_bits, all good. If it doesn't, limit it
+ * to the end of RAM, which is guaranteed to fit within pa_bits.
+ */
+vms->highest_gpa = (base <= BIT_ULL(pa_bits) ? base : memtop) - 1;
+
 if (device_memory_size > 0) {
 ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
 ms->device_memory->base = device_memory_base;
@@ -1902,12 +1917,43 @@ static void machvirt_init(MachineState *machine)
 unsigned int smp_cpus = machine->smp.cpus;
 unsigned int max_cpus = machine->smp.max_cpus;
 
+if (!cpu_type_valid(machine->cpu_type)) {
+error_report("mach-virt: CPU type %s not supported", 
machine->cpu_type);
+exit(1);
+}
+
+possible_cpus = mc->possible_cpu_arch_ids(machine);
+
 /*
  * In accelerated mode, the memory map is computed earlier in kvm_type()
  * to create a VM with the right number of IPA bits.
  */
 if (!vms->memmap) {
-virt_set_memmap(vms);
+Object *cpuobj;
+ARMCPU *armcpu;
+int pa_bits;
+
+/*
+ * Instanciate a temporary CPU object to find out about what
+ * we are about to deal with. Once this is done, get rid of
+ * the object.
+ */
+cpuobj = object_new(possible_cpus->cpus[0].type);
+armcpu = ARM_CPU(cpuobj);
+
+if (object_property_get_bool(cpuobj, "aarch64", NULL)) {
+pa_bits = arm_pamax(armcpu);
+} else if (arm_feature(&armcpu->env, ARM_FEATURE_LPAE)) {
+/* v7 with LPAE */
+pa_bits = 40;
+} else {
+/* Anything else */
+pa_bits = 32;
+}
+
+object_unref(cpuobj);
+
+virt_set_memmap(vms, pa_bits);
 }
 
 /* We can probe only here because during property set
@@ -1915,11 +1961,6 @@ static void machvirt_init(MachineState *machine)
  */
 finalize_gic_version(vms);
 
-if (!cpu_type_valid(machine->cpu_type)) {
-error_report("mach-virt: CPU type %s not supported", 
machine->cpu_type);
-exit(1);
-}
-
 if (vms->secure) {
 /*
  * The Secure view of the world is the same as the NonSecure,
@@ -1989,7 +2030,6 @@ static void machvirt_init(MachineState *machine)
 
 create_fdt(vms);
 
-possible_cpus = mc->possible_cpu_arch_ids(machine);
 assert(possible_cpus->len == max_cpus);
 for (n = 0; n < possible_cpus->len; n++) {
 Object *cpuobj;
@@ -2646,7 +2686,7 @@ static int virt_kvm_type(MachineState *ms, const char 
*type_str)
 max_vm_pa_size = kvm_arm_get_max_vm_ipa_size(ms, &fixed_ipa);
 
 /* we freeze the memory map to compute the highest gpa */
-virt_set_memmap(vms);
+virt_set_memmap(vms, max_vm_pa_size);
 
 requested_pa_size = 64 - clz64(vms->h

[PATCH 5/4] tests: acpi: test short OEM_ID/OEM_TABLE_ID values in test_oem_fields()

2022-01-14 Thread Igor Mammedov

Previous patch [1] added explicit whitespace padding to OEM_ID/OEM_TABLE_ID
values used in test_oem_fields() testcase to avoid false positive and
bisection issues when QEMU is switched to \0' padding. As result
testcase ceased to test values that were shorter than max possible
length values.

Update testcase to make sure that it's testing shorter IDs like it
used to before [2].

1) "tests: acpi: manually pad OEM_ID/OEM_TABLE_ID for  test_oem_fields() test"
2) 602b458201 ("acpi: Permit OEM ID and OEM table ID fields to be changed")

Signed-off-by: Igor Mammedov 
---
 tests/qtest/bios-tables-test.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index 90c9f6a0a2..ad536fd7b1 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -71,10 +71,10 @@
 
 #define ACPI_REBUILD_EXPECTED_AML "TEST_ACPI_REBUILD_AML"
 
-#define OEM_ID "TEST  "
-#define OEM_TABLE_ID   "OEM "
-#define OEM_TEST_ARGS  "-machine x-oem-id='" OEM_ID "',x-oem-table-id='" \
-   OEM_TABLE_ID "'"
+#define OEM_ID "TEST"
+#define OEM_TABLE_ID   "OEM"
+#define OEM_TEST_ARGS  "-machine x-oem-id=" OEM_ID ",x-oem-table-id=" \
+   OEM_TABLE_ID
 
 typedef struct {
 bool tcg_only;
@@ -1530,8 +1530,8 @@ static void test_oem_fields(test_data *data)
 continue;
 }
 
-g_assert(memcmp(sdt->aml + 10, OEM_ID, 6) == 0);
-g_assert(memcmp(sdt->aml + 16, OEM_TABLE_ID, 8) == 0);
+g_assert(strncmp((char *)sdt->aml + 10, OEM_ID, 6) == 0);
+g_assert(strncmp((char *)sdt->aml + 16, OEM_TABLE_ID, 8) == 0);
 }
 }
 
-- 
2.31.1

[PATCH v5 0/6] target/arm: Reduced-IPA space and highmem fixes

Here's yet another stab at enabling QEMU on systems with
pathologically reduced IPA ranges such as the Apple M1 (previous
version at [1]). Eventually, we're able to run a KVM guest with more
than just 3GB of RAM on a system with a 36bit IPA space, and at most
123 vCPUs.

This also addresses some pathological QEMU behaviours, where the
highmem property is used as a flag allowing exposure of devices that
can't possibly fit in the PA space of the VM, resulting in a guest
failure.

In the end, we generalise the notion of PA space when exposing
individual devices in the expanded memory map, and treat highmem as
another flavour of PA space restriction.

This series does a few things:

- introduce new attributes to control the enabling of the highmem
  GICv3 redistributors and the highmem PCIe MMIO range

- correctly cap the PA range with highmem is off

- generalise the highmem behaviour to any PA range

- disable each highmem device region that doesn't fit in the PA range

- cleanup uses of highmem outside of virt_set_memmap()

This has been tested on an M1-based Mac-mini running Linux v5.16-rc6
with both KVM and TCG.

* From v4: [1]

  - Moved cpu_type_valid() check before we compute the memory map
  - Drop useless MAX() when computing highest_gpa
  - Fixed more deviations from the QEMU coding style
  - Collected Eric's RBs, with thanks

[1]: https://lore.kernel.org/r/20220107163324.2491209-1-...@kernel.org

Marc Zyngier (6):
  hw/arm/virt: Add a control for the the highmem PCIe MMIO
  hw/arm/virt: Add a control for the the highmem redistributors
  hw/arm/virt: Honor highmem setting when computing the memory map
  hw/arm/virt: Use the PA range to compute the memory map
  hw/arm/virt: Disable highmem devices that don't fit in the PA range
  hw/arm/virt: Drop superfluous checks against highmem

 hw/arm/virt-acpi-build.c | 10 ++--
 hw/arm/virt.c| 98 ++--
 include/hw/arm/virt.h|  5 +-
 3 files changed, 91 insertions(+), 22 deletions(-)

-- 
2.30.2

Re: [PATCH 1/2] hw/virtio: add boilerplate for vhost-user-gpio device



Viresh Kumar  writes:

> This creates the QEMU side of the vhost-user-gpio device which connects
> to the remote daemon. It is based of vhost-user-i2c code.
>
> Signed-off-by: Viresh Kumar 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

Re: [PATCH v3 2/3] target/riscv: add support for svinval extension

2022-01-14 Thread Anup Patel

On Fri, Jan 14, 2022 at 7:24 PM Weiwei Li  wrote:
>
> Thanks for your comments.
>
> 在 2022/1/14 下午9:40, Anup Patel 写道:
>
> On Fri, Jan 14, 2022 at 7:11 AM Weiwei Li  wrote:
>
> Signed-off-by: Weiwei Li 
> Signed-off-by: Junqiang Wang 
> ---
>  target/riscv/cpu.c  |  1 +
>  target/riscv/cpu.h  |  1 +
>  target/riscv/insn32.decode  |  7 ++
>  target/riscv/insn_trans/trans_svinval.c.inc | 75 +
>  target/riscv/translate.c|  1 +
>  5 files changed, 85 insertions(+)
>  create mode 100644 target/riscv/insn_trans/trans_svinval.c.inc
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index ff6c86c85b..45ac98e06b 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -668,6 +668,7 @@ static Property riscv_cpu_properties[] = {
>  DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
>  DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
>
> +DEFINE_PROP_BOOL("svinval", RISCVCPU, cfg.ext_svinval, false),
>  DEFINE_PROP_BOOL("svnapot", RISCVCPU, cfg.ext_svnapot, false),
>
>  DEFINE_PROP_BOOL("zba", RISCVCPU, cfg.ext_zba, true),
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index d3d17cde82..c3d1845ca1 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -327,6 +327,7 @@ struct RISCVCPU {
>  bool ext_counters;
>  bool ext_ifencei;
>  bool ext_icsr;
> +bool ext_svinval;
>  bool ext_svnapot;
>  bool ext_zfh;
>  bool ext_zfhmin;
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 5bbedc254c..7a0351fde2 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -809,3 +809,10 @@ fcvt_l_h   1100010  00010 . ... . 1010011 @r2_rm
>  fcvt_lu_h  1100010  00011 . ... . 1010011 @r2_rm
>  fcvt_h_l   1101010  00010 . ... . 1010011 @r2_rm
>  fcvt_h_lu  1101010  00011 . ... . 1010011 @r2_rm
> +
> +# *** Svinval Standard Extension ***
> +sinval_vma0001011 . . 000 0 1110011 @sfence_vma
> +sfence_w_inval0001100 0 0 000 0 1110011
> +sfence_inval_ir   0001100 1 0 000 0 1110011
> +hinval_vvma   0011011 . . 000 0 1110011 @hfence_vvma
>
> s/0011011/0010011/
>
> +hinval_gvma   0111011 . . 000 0 1110011 @hfence_gvma
>
> s/0111011/0110011/
>
> Sorry. I didn't find the encodings for svinval instructions from the spec. So 
> I get them from  spike 
> (https://github.com/riscv-software-src/riscv-isa-sim/blob/master/riscv/encoding.h)
>  which are as follows:
>
> #define MATCH_HINVAL_VVMA 0x3673
> #define MASK_HINVAL_VVMA 0xfe007fff
> #define MATCH_HINVAL_GVMA 0x7673
> #define MASK_HINVAL_GVMA 0xfe007fff
> Are they not the latest encodings?

The code in Spike seems to be buggy but that's a separate issue.

Refer, page 138 of
https://github.com/riscv/riscv-isa-manual/releases/download/draft-20220110-eae4f00/riscv-privileged.pdf

Regards,
Anup

>
> diff --git a/target/riscv/insn_trans/trans_svinval.c.inc 
> b/target/riscv/insn_trans/trans_svinval.c.inc
> new file mode 100644
> index 00..1dde665661
> --- /dev/null
> +++ b/target/riscv/insn_trans/trans_svinval.c.inc
> @@ -0,0 +1,75 @@
> +/*
> + * RISC-V translation routines for the Svinval Standard Instruction Set.
> + *
> + * Copyright (c) 2020-2021 PLCT lab
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this program.  If not, see .
> + */
> +
> +#define REQUIRE_SVINVAL(ctx) do {\
> +if (!RISCV_CPU(ctx->cs)->cfg.ext_svinval) {  \
> +return false;\
> +}\
> +} while (0)
> +
> +static bool trans_sinval_vma(DisasContext *ctx, arg_sinval_vma *a)
> +{
> +REQUIRE_SVINVAL(ctx);
> +/* Do the same as sfence.vma currently */
> +REQUIRE_EXT(ctx, RVS);
> +#ifndef CONFIG_USER_ONLY
> +gen_helper_tlb_flush(cpu_env);
> +return true;
> +#endif
> +return false;
> +}
> +
> +static bool trans_sfence_w_inval(DisasContext *ctx, arg_sfence_w_inval *a)
> +{
> +REQUIRE_SVINVAL(ctx);
> +REQUIRE_EXT(ctx, RVS);
> +/* Do nothing currently */
> +return true;
> +}
> +
> +static bool trans_sfence_inval_ir(DisasContext *ctx, arg_sfence_inval_ir *a)
> +{
> +REQUIRE_SVINVAL(ctx);
> +REQUIRE_EXT(ctx, RVS);
> +/* Do nothin

[PULL 12/16] vvfat: Fix vvfat_write() for writes before the root directory

The calculation in sector2cluster() is done relative to the offset of
the root directory. Any writes to blocks before the start of the root
directory (in particular, writes to the FAT) result in negative values,
which are not handled correctly in vvfat_write().

This changes sector2cluster() to return a signed value, and makes sure
that vvfat_write() doesn't try to find mappings for negative cluster
number. It clarifies the code in vvfat_write() to make it more obvious
that the cluster numbers can be negative.

Signed-off-by: Kevin Wolf 
Message-Id: <20211209152231.23756-1-kw...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 block/vvfat.c | 30 ++
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index 36e73d4c64..b2b58d93b8 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -882,7 +882,7 @@ static int read_directory(BDRVVVFATState* s, int 
mapping_index)
 return 0;
 }
 
-static inline uint32_t sector2cluster(BDRVVVFATState* s,off_t sector_num)
+static inline int32_t sector2cluster(BDRVVVFATState* s,off_t sector_num)
 {
 return (sector_num - s->offset_to_root_dir) / s->sectors_per_cluster;
 }
@@ -2981,6 +2981,7 @@ static int vvfat_write(BlockDriverState *bs, int64_t 
sector_num,
 {
 BDRVVVFATState *s = bs->opaque;
 int i, ret;
+int first_cluster, last_cluster;
 
 DLOG(checkpoint());
 
@@ -2999,9 +3000,20 @@ DLOG(checkpoint());
 if (sector_num < s->offset_to_fat)
 return -1;
 
-for (i = sector2cluster(s, sector_num);
-i <= sector2cluster(s, sector_num + nb_sectors - 1);) {
-mapping_t* mapping = find_mapping_for_cluster(s, i);
+/*
+ * Values will be negative for writes to the FAT, which is located before
+ * the root directory.
+ */
+first_cluster = sector2cluster(s, sector_num);
+last_cluster = sector2cluster(s, sector_num + nb_sectors - 1);
+
+for (i = first_cluster; i <= last_cluster;) {
+mapping_t *mapping = NULL;
+
+if (i >= 0) {
+mapping = find_mapping_for_cluster(s, i);
+}
+
 if (mapping) {
 if (mapping->read_only) {
 fprintf(stderr, "Tried to write to write-protected file %s\n",
@@ -3041,8 +3053,9 @@ DLOG(checkpoint());
 }
 }
 i = mapping->end;
-} else
+} else {
 i++;
+}
 }
 
 /*
@@ -3056,10 +3069,11 @@ DLOG(fprintf(stderr, "Write to qcow backend: %d + 
%d\n", (int)sector_num, nb_sec
 return ret;
 }
 
-for (i = sector2cluster(s, sector_num);
-i <= sector2cluster(s, sector_num + nb_sectors - 1); i++)
-if (i >= 0)
+for (i = first_cluster; i <= last_cluster; i++) {
+if (i >= 0) {
 s->used_clusters[i] |= USED_ALLOCATED;
+}
+}
 
 DLOG(checkpoint());
 /* TODO: add timeout */
-- 
2.31.1

Re: [PATCH v3 3/3] target/riscv: add support for svpbmt extension

2022-01-14 Thread Anup Patel

On Fri, Jan 14, 2022 at 7:11 AM Weiwei Li  wrote:
>
> It uses two PTE bits, but otherwise has no effect on QEMU, since QEMU is 
> sequentially consistent and doesn't model PMAs currently
>
> Signed-off-by: Weiwei Li 
> Signed-off-by: Junqiang Wang 
> Tested-by: Heiko Stuebner 
> ---
>  target/riscv/cpu.c| 1 +
>  target/riscv/cpu.h| 1 +
>  target/riscv/cpu_bits.h   | 3 +++
>  target/riscv/cpu_helper.c | 9 -
>  4 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 45ac98e06b..4f82bd00a3 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -670,6 +670,7 @@ static Property riscv_cpu_properties[] = {
>
>  DEFINE_PROP_BOOL("svinval", RISCVCPU, cfg.ext_svinval, false),
>  DEFINE_PROP_BOOL("svnapot", RISCVCPU, cfg.ext_svnapot, false),
> +DEFINE_PROP_BOOL("svpbmt", RISCVCPU, cfg.ext_svpbmt, false),
>
>  DEFINE_PROP_BOOL("zba", RISCVCPU, cfg.ext_zba, true),
>  DEFINE_PROP_BOOL("zbb", RISCVCPU, cfg.ext_zbb, true),
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index c3d1845ca1..53f314c752 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -329,6 +329,7 @@ struct RISCVCPU {
>  bool ext_icsr;
>  bool ext_svinval;
>  bool ext_svnapot;
> +bool ext_svpbmt;
>  bool ext_zfh;
>  bool ext_zfhmin;
>
> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> index bc23e3b523..ee294c1d0b 100644
> --- a/target/riscv/cpu_bits.h
> +++ b/target/riscv/cpu_bits.h
> @@ -486,7 +486,10 @@ typedef enum {
>  #define PTE_A   0x040 /* Accessed */
>  #define PTE_D   0x080 /* Dirty */
>  #define PTE_SOFT0x300 /* Reserved for Software */
> +#define PTE_RSVD0x1FC0 /* Reserved for future use */
> +#define PTE_PBMT0x6000 /* Page-based memory types */
>  #define PTE_N   0x8000 /* NAPOT translation */
> +#define PTE_ATTR0xFFC0 /* All attributes bits */
>
>  /* Page table PPN shift amount */
>  #define PTE_PPN_SHIFT   10
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 58ab85bca3..f90766e026 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -619,16 +619,23 @@ restart:
>  return TRANSLATE_FAIL;
>  }
>
> -hwaddr ppn = (pte & ~(target_ulong)PTE_N) >> PTE_PPN_SHIFT;
> +hwaddr ppn = (pte & ~(target_ulong)PTE_ATTR) >> PTE_PPN_SHIFT;
>
>  RISCVCPU *cpu = env_archcpu(env);
>  if (!cpu->cfg.ext_svnapot && (pte & PTE_N)) {
>  return TRANSLATE_FAIL;
> +} else if (!cpu->cfg.ext_svpbmt && (pte & PTE_PBMT)) {
> +return TRANSLATE_FAIL;
> +} else if (pte & PTE_RSVD) {
> +return TRANSLATE_FAIL;
>  } else if (!(pte & PTE_V)) {
>  /* Invalid PTE */
>  return TRANSLATE_FAIL;
>  } else if (!(pte & (PTE_R | PTE_W | PTE_X))) {
>  /* Inner PTE, continue walking */
> +if (pte & (PTE_D | PTE_A | PTE_U | PTE_N | PTE_PBMT)) {
> +return TRANSLATE_FAIL;
> +}

I think you should add a patch before PATCH1 to add following:

if (pte & (PTE_D | PTE_A | PTE_U)) {
return TRANSLATE_FAIL;
}

The current PATCH1 should add PTE_N to the comparison and
this patch can add PTE_PBMT to the comparison.

>  base = ppn << PGSHIFT;
>  } else if ((pte & (PTE_R | PTE_W | PTE_X)) == PTE_W) {
>  /* Reserved leaf PTE flags: PTE_W */
> --
> 2.17.1
>

Apart from the minor comment above, it looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

Re: [PATCH v3 2/3] target/riscv: add support for svinval extension

2022-01-14 Thread Weiwei Li


Thanks for your comments.

在 2022/1/14 下午9:40, Anup Patel 写道:

On Fri, Jan 14, 2022 at 7:11 AM Weiwei Li  wrote:

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
  target/riscv/cpu.c  |  1 +
  target/riscv/cpu.h  |  1 +
  target/riscv/insn32.decode  |  7 ++
  target/riscv/insn_trans/trans_svinval.c.inc | 75 +
  target/riscv/translate.c|  1 +
  5 files changed, 85 insertions(+)
  create mode 100644 target/riscv/insn_trans/trans_svinval.c.inc

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index ff6c86c85b..45ac98e06b 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -668,6 +668,7 @@ static Property riscv_cpu_properties[] = {
  DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
  DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),

+DEFINE_PROP_BOOL("svinval", RISCVCPU, cfg.ext_svinval, false),
  DEFINE_PROP_BOOL("svnapot", RISCVCPU, cfg.ext_svnapot, false),

  DEFINE_PROP_BOOL("zba", RISCVCPU, cfg.ext_zba, true),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index d3d17cde82..c3d1845ca1 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -327,6 +327,7 @@ struct RISCVCPU {
  bool ext_counters;
  bool ext_ifencei;
  bool ext_icsr;
+bool ext_svinval;
  bool ext_svnapot;
  bool ext_zfh;
  bool ext_zfhmin;
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5bbedc254c..7a0351fde2 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -809,3 +809,10 @@ fcvt_l_h   1100010  00010 . ... . 1010011 @r2_rm
  fcvt_lu_h  1100010  00011 . ... . 1010011 @r2_rm
  fcvt_h_l   1101010  00010 . ... . 1010011 @r2_rm
  fcvt_h_lu  1101010  00011 . ... . 1010011 @r2_rm
+
+# *** Svinval Standard Extension ***
+sinval_vma0001011 . . 000 0 1110011 @sfence_vma
+sfence_w_inval0001100 0 0 000 0 1110011
+sfence_inval_ir   0001100 1 0 000 0 1110011
+hinval_vvma   0011011 . . 000 0 1110011 @hfence_vvma

s/0011011/0010011/


+hinval_gvma   0111011 . . 000 0 1110011 @hfence_gvma

s/0111011/0110011/


Sorry. I didn't find the encodings for svinval instructions from the 
spec. So I get them from  spike 
(https://github.com/riscv-software-src/riscv-isa-sim/blob/master/riscv/encoding.h) 
which are as follows:


#defineMATCH_HINVAL_VVMA0x3673
#defineMASK_HINVAL_VVMA0xfe007fff
#defineMATCH_HINVAL_GVMA0x7673
#defineMASK_HINVAL_GVMA0xfe007fff
Are they not the latest encodings?

diff --git a/target/riscv/insn_trans/trans_svinval.c.inc 
b/target/riscv/insn_trans/trans_svinval.c.inc
new file mode 100644
index 00..1dde665661
--- /dev/null
+++ b/target/riscv/insn_trans/trans_svinval.c.inc
@@ -0,0 +1,75 @@
+/*
+ * RISC-V translation routines for the Svinval Standard Instruction Set.
+ *
+ * Copyright (c) 2020-2021 PLCT lab
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#define REQUIRE_SVINVAL(ctx) do {\
+if (!RISCV_CPU(ctx->cs)->cfg.ext_svinval) {  \
+return false;\
+}\
+} while (0)
+
+static bool trans_sinval_vma(DisasContext *ctx, arg_sinval_vma *a)
+{
+REQUIRE_SVINVAL(ctx);
+/* Do the same as sfence.vma currently */
+REQUIRE_EXT(ctx, RVS);
+#ifndef CONFIG_USER_ONLY
+gen_helper_tlb_flush(cpu_env);
+return true;
+#endif
+return false;
+}
+
+static bool trans_sfence_w_inval(DisasContext *ctx, arg_sfence_w_inval *a)
+{
+REQUIRE_SVINVAL(ctx);
+REQUIRE_EXT(ctx, RVS);
+/* Do nothing currently */
+return true;
+}
+
+static bool trans_sfence_inval_ir(DisasContext *ctx, arg_sfence_inval_ir *a)
+{
+REQUIRE_SVINVAL(ctx);
+REQUIRE_EXT(ctx, RVS);
+/* Do nothing currently */
+return true;
+}
+
+static bool trans_hinval_vvma(DisasContext *ctx, arg_hinval_vvma *a)
+{
+REQUIRE_SVINVAL(ctx);
+/* Do the same as hfence.vvma currently */
+REQUIRE_EXT(ctx, RVH);
+#ifndef CONFIG_USER_ONLY
+gen_helper_hyp_tlb_flush(cpu_env);
+return true;
+#endif
+return false;
+}
+
+static bool trans_hinval_gvma(DisasContext *ctx, arg_hinval_gvma *a)
+{
+REQUIRE_SVINVAL(ctx);
+/* Do the same as hfence.gvma currently */
+REQUIRE_

[PULL 13/16] iotests: Test qemu-img convert of zeroed data cluster

This demonstrates what happens when the block status changes in
sub-min_sparse granularity, but all of the parts are zeroed out. The
alignment logic in is_allocated_sectors() prevents that the target image
remains fully sparse as expected, but turns it into a data cluster of
explicit zeros.

Signed-off-by: Kevin Wolf 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20211217164654.1184218-2-vsement...@virtuozzo.com>
Tested-by: Peter Lieven 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/122 |  1 +
 tests/qemu-iotests/122.out | 10 --
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/122 b/tests/qemu-iotests/122
index efb260d822..be0f6b79e5 100755
--- a/tests/qemu-iotests/122
+++ b/tests/qemu-iotests/122
@@ -251,6 +251,7 @@ $QEMU_IO -c "write -P 0 0 64k" "$TEST_IMG" 2>&1 | 
_filter_qemu_io | _filter_test
 $QEMU_IO -c "write 0 1k" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
 $QEMU_IO -c "write 8k 1k" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
 $QEMU_IO -c "write 17k 1k" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -c "write -P 0 65k 1k" "$TEST_IMG" 2>&1 | _filter_qemu_io | 
_filter_testdir
 
 for min_sparse in 4k 8k; do
 echo
diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out
index 8fbdac2b39..69b8e8b803 100644
--- a/tests/qemu-iotests/122.out
+++ b/tests/qemu-iotests/122.out
@@ -192,6 +192,8 @@ wrote 1024/1024 bytes at offset 8192
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 1024/1024 bytes at offset 17408
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1024/1024 bytes at offset 66560
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 convert -S 4k
 [{ "start": 0, "length": 4096, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": OFFSET},
@@ -199,7 +201,9 @@ convert -S 4k
 { "start": 8192, "length": 4096, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": OFFSET},
 { "start": 12288, "length": 4096, "depth": 0, "present": false, "zero": true, 
"data": false},
 { "start": 16384, "length": 4096, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": OFFSET},
-{ "start": 20480, "length": 67088384, "depth": 0, "present": false, "zero": 
true, "data": false}]
+{ "start": 20480, "length": 46080, "depth": 0, "present": false, "zero": true, 
"data": false},
+{ "start": 66560, "length": 1024, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": OFFSET},
+{ "start": 67584, "length": 67041280, "depth": 0, "present": false, "zero": 
true, "data": false}]
 
 convert -c -S 4k
 [{ "start": 0, "length": 1024, "depth": 0, "present": true, "zero": false, 
"data": true},
@@ -211,7 +215,9 @@ convert -c -S 4k
 
 convert -S 8k
 [{ "start": 0, "length": 24576, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": OFFSET},
-{ "start": 24576, "length": 67084288, "depth": 0, "present": false, "zero": 
true, "data": false}]
+{ "start": 24576, "length": 41984, "depth": 0, "present": false, "zero": true, 
"data": false},
+{ "start": 66560, "length": 1024, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": OFFSET},
+{ "start": 67584, "length": 67041280, "depth": 0, "present": false, "zero": 
true, "data": false}]
 
 convert -c -S 8k
 [{ "start": 0, "length": 1024, "depth": 0, "present": true, "zero": false, 
"data": true},
-- 
2.31.1

[PULL 15/16] block: drop BLK_PERM_GRAPH_MOD

From: Vladimir Sementsov-Ogievskiy 

First, this permission never protected a node from being changed, as
generic child-replacing functions don't check it.

Second, it's a strange thing: it presents a permission of parent node
to change its child. But generally, children are replaced by different
mechanisms, like jobs or qmp commands, not by nodes.

Graph-mod permission is hard to understand. All other permissions
describe operations which done by parent node on its child: read,
write, resize. Graph modification operations are something completely
different.

The only place where BLK_PERM_GRAPH_MOD is used as "perm" (not shared
perm) is mirror_start_job, for s->target. Still modern code should use
bdrv_freeze_backing_chain() to protect from graph modification, if we
don't do it somewhere it may be considered as a bug. So, it's a bit
risky to drop GRAPH_MOD, and analyzing of possible loss of protection
is hard. But one day we should do it, let's do it now.

One more bit of information is that locking the corresponding byte in
file-posix doesn't make sense at all.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20210902093754.2352-1-vsement...@virtuozzo.com>
Signed-off-by: Kevin Wolf 
---
 qapi/block-core.json  |  7 ++-
 include/block/block.h |  9 +
 block.c   |  7 +--
 block/commit.c|  1 -
 block/mirror.c| 15 +++
 hw/block/block.c  |  3 +--
 scripts/render_block_graph.py |  1 -
 tests/qemu-iotests/273.out|  4 
 8 files changed, 12 insertions(+), 35 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index bd0b285245..9a5a3641d0 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1878,14 +1878,11 @@
 #
 # @resize: This permission is required to change the size of a block node.
 #
-# @graph-mod: This permission is required to change the node that this
-# BdrvChild points to.
-#
 # Since: 4.0
 ##
 { 'enum': 'BlockPermission',
-  'data': [ 'consistent-read', 'write', 'write-unchanged', 'resize',
-'graph-mod' ] }
+  'data': [ 'consistent-read', 'write', 'write-unchanged', 'resize' ] }
+
 ##
 # @XDbgBlockGraphEdge:
 #
diff --git a/include/block/block.h b/include/block/block.h
index e5dd22b034..9d4050220b 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -269,12 +269,13 @@ enum {
 BLK_PERM_RESIZE = 0x08,
 
 /**
- * This permission is required to change the node that this BdrvChild
- * points to.
+ * There was a now-removed bit BLK_PERM_GRAPH_MOD, with value of 0x10. QEMU
+ * 6.1 and earlier may still lock the corresponding byte in 
block/file-posix
+ * locking.  So, implementing some new permission should be very careful to
+ * not interfere with this old unused thing.
  */
-BLK_PERM_GRAPH_MOD  = 0x10,
 
-BLK_PERM_ALL= 0x1f,
+BLK_PERM_ALL= 0x0f,
 
 DEFAULT_PERM_PASSTHROUGH= BLK_PERM_CONSISTENT_READ
  | BLK_PERM_WRITE
diff --git a/block.c b/block.c
index 10346b5011..7b3ce415d8 100644
--- a/block.c
+++ b/block.c
@@ -2485,7 +2485,6 @@ char *bdrv_perm_names(uint64_t perm)
 { BLK_PERM_WRITE,   "write" },
 { BLK_PERM_WRITE_UNCHANGED, "write unchanged" },
 { BLK_PERM_RESIZE,  "resize" },
-{ BLK_PERM_GRAPH_MOD,   "change children" },
 { 0, NULL }
 };
 
@@ -2601,8 +2600,7 @@ static void bdrv_default_perms_for_cow(BlockDriverState 
*bs, BdrvChild *c,
 shared = 0;
 }
 
-shared |= BLK_PERM_CONSISTENT_READ | BLK_PERM_GRAPH_MOD |
-  BLK_PERM_WRITE_UNCHANGED;
+shared |= BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED;
 
 if (bs->open_flags & BDRV_O_INACTIVE) {
 shared |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
@@ -2720,7 +2718,6 @@ uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission 
qapi_perm)
 [BLOCK_PERMISSION_WRITE]= BLK_PERM_WRITE,
 [BLOCK_PERMISSION_WRITE_UNCHANGED]  = BLK_PERM_WRITE_UNCHANGED,
 [BLOCK_PERMISSION_RESIZE]   = BLK_PERM_RESIZE,
-[BLOCK_PERMISSION_GRAPH_MOD]= BLK_PERM_GRAPH_MOD,
 };
 
 QEMU_BUILD_BUG_ON(ARRAY_SIZE(permissions) != BLOCK_PERMISSION__MAX);
@@ -5546,8 +5543,6 @@ int bdrv_drop_intermediate(BlockDriverState *top, 
BlockDriverState *base,
 update_inherits_from = bdrv_inherits_from_recursive(base, explicit_top);
 
 /* success - we can delete the intermediate states, and link top->base */
-/* TODO Check graph modification op blockers (BLK_PERM_GRAPH_MOD) once
- * we've figured out how they should work. */
 if (!backing_file_str) {
 bdrv_refresh_filename(base);
 backing_file_str = base->filename;
diff --git a/block/commit.c b/block/commit.c
index 10cc5ff451..b1fc7b908b 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -370,7 +370,6 @@ void commit_start(const char

[PULL 14/16] qemu-img: make is_allocated_sectors() more efficient

From: Vladimir Sementsov-Ogievskiy 

Consider the case when the whole buffer is zero and end is unaligned.

If i <= tail, we return 1 and do one unaligned WRITE, RMW happens.

If i > tail, we do on aligned WRITE_ZERO (or skip if target is zeroed)
and again one unaligned WRITE, RMW happens.

Let's do better: don't fragment the whole-zero buffer and report it as
ZERO: in case of zeroed target we just do nothing and avoid RMW. If
target is not zeroes, one unaligned WRITE_ZERO should not be much worse
than one unaligned WRITE.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20211217164654.1184218-3-vsement...@virtuozzo.com>
Tested-by: Peter Lieven 
Signed-off-by: Kevin Wolf 
---
 qemu-img.c | 23 +++
 tests/qemu-iotests/122.out |  8 ++--
 2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 21ba1e6800..6fe2466032 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1171,19 +1171,34 @@ static int is_allocated_sectors(const uint8_t *buf, int 
n, int *pnum,
 }
 }
 
+if (i == n) {
+/*
+ * The whole buf is the same.
+ * No reason to split it into chunks, so return now.
+ */
+*pnum = i;
+return !is_zero;
+}
+
 tail = (sector_num + i) & (alignment - 1);
 if (tail) {
 if (is_zero && i <= tail) {
-/* treat unallocated areas which only consist
- * of a small tail as allocated. */
+/*
+ * For sure next sector after i is data, and it will rewrite this
+ * tail anyway due to RMW. So, let's just write data now.
+ */
 is_zero = false;
 }
 if (!is_zero) {
-/* align up end offset of allocated areas. */
+/* If possible, align up end offset of allocated areas. */
 i += alignment - tail;
 i = MIN(i, n);
 } else {
-/* align down end offset of zero areas. */
+/*
+ * For sure next sector after i is data, and it will rewrite this
+ * tail anyway due to RMW. Better is avoid RMW and write zeroes up
+ * to aligned bound.
+ */
 i -= tail;
 }
 }
diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out
index 69b8e8b803..e18766e167 100644
--- a/tests/qemu-iotests/122.out
+++ b/tests/qemu-iotests/122.out
@@ -201,9 +201,7 @@ convert -S 4k
 { "start": 8192, "length": 4096, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": OFFSET},
 { "start": 12288, "length": 4096, "depth": 0, "present": false, "zero": true, 
"data": false},
 { "start": 16384, "length": 4096, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": OFFSET},
-{ "start": 20480, "length": 46080, "depth": 0, "present": false, "zero": true, 
"data": false},
-{ "start": 66560, "length": 1024, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": OFFSET},
-{ "start": 67584, "length": 67041280, "depth": 0, "present": false, "zero": 
true, "data": false}]
+{ "start": 20480, "length": 67088384, "depth": 0, "present": false, "zero": 
true, "data": false}]
 
 convert -c -S 4k
 [{ "start": 0, "length": 1024, "depth": 0, "present": true, "zero": false, 
"data": true},
@@ -215,9 +213,7 @@ convert -c -S 4k
 
 convert -S 8k
 [{ "start": 0, "length": 24576, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": OFFSET},
-{ "start": 24576, "length": 41984, "depth": 0, "present": false, "zero": true, 
"data": false},
-{ "start": 66560, "length": 1024, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": OFFSET},
-{ "start": 67584, "length": 67041280, "depth": 0, "present": false, "zero": 
true, "data": false}]
+{ "start": 24576, "length": 67084288, "depth": 0, "present": false, "zero": 
true, "data": false}]
 
 convert -c -S 8k
 [{ "start": 0, "length": 1024, "depth": 0, "present": true, "zero": false, 
"data": true},
-- 
2.31.1

[PULL 11/16] vvfat: Fix size of temporary qcow file

The size of the qcow size was calculated so that only the FAT partition
would fit on it, but not the whole disk. However, offsets relative to
the whole disk are used to access it, so increase its size to be large
enough for that.

Signed-off-by: Kevin Wolf 
Message-Id: <20211209151815.23495-1-kw...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 block/vvfat.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index 5dacc6cfac..36e73d4c64 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1230,6 +1230,7 @@ static int vvfat_open(BlockDriverState *bs, QDict 
*options, int flags,
  dirname, cyls, heads, secs));
 
 s->sector_count = cyls * heads * secs - s->offset_to_bootsector;
+bs->total_sectors = cyls * heads * secs;
 
 if (qemu_opt_get_bool(opts, "rw", false)) {
 if (!bdrv_is_read_only(bs)) {
@@ -1250,8 +1251,6 @@ static int vvfat_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 }
 
-bs->total_sectors = cyls * heads * secs;
-
 if (init_directories(s, dirname, heads, secs, errp)) {
 ret = -EIO;
 goto fail;
@@ -3147,8 +3146,8 @@ static int enable_write_target(BlockDriverState *bs, 
Error **errp)
 }
 
 opts = qemu_opts_create(bdrv_qcow->create_opts, NULL, 0, &error_abort);
-qemu_opt_set_number(opts, BLOCK_OPT_SIZE, s->sector_count * 512,
-&error_abort);
+qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
+bs->total_sectors * BDRV_SECTOR_SIZE, &error_abort);
 qemu_opt_set(opts, BLOCK_OPT_BACKING_FILE, "fat:", &error_abort);
 
 ret = bdrv_create(bdrv_qcow, s->qcow_filename, opts, errp);
-- 
2.31.1

[PATCH v5 5/6] hw/arm/virt: Disable highmem devices that don't fit in the PA range

In order to only keep the highmem devices that actually fit in
the PA range, check their location against the range and update
highest_gpa if they fit. If they don't, mark them as disabled.

Signed-off-by: Marc Zyngier 
---
 hw/arm/virt.c | 34 --
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index a427676b50..053791cc44 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1712,21 +1712,43 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 base = vms->memmap[VIRT_MEM].base + LEGACY_RAMLIMIT_BYTES;
 }
 
+/* We know for sure that at least the memory fits in the PA space */
+vms->highest_gpa = memtop - 1;
+
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
 hwaddr size = extended_memmap[i].size;
+bool fits;
 
 base = ROUND_UP(base, size);
 vms->memmap[i].base = base;
 vms->memmap[i].size = size;
+
+/*
+ * Check each device to see if they fit in the PA space,
+ * moving highest_gpa as we go.
+ *
+ * For each device that doesn't fit, disable it.
+ */
+fits = (base + size) <= BIT_ULL(pa_bits);
+if (fits) {
+vms->highest_gpa = base + size - 1;
+}
+
+switch (i) {
+case VIRT_HIGH_GIC_REDIST2:
+vms->highmem_redists &= fits;
+break;
+case VIRT_HIGH_PCIE_ECAM:
+vms->highmem_ecam &= fits;
+break;
+case VIRT_HIGH_PCIE_MMIO:
+vms->highmem_mmio &= fits;
+break;
+}
+
 base += size;
 }
 
-/*
- * If base fits within pa_bits, all good. If it doesn't, limit it
- * to the end of RAM, which is guaranteed to fit within pa_bits.
- */
-vms->highest_gpa = (base <= BIT_ULL(pa_bits) ? base : memtop) - 1;
-
 if (device_memory_size > 0) {
 ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
 ms->device_memory->base = device_memory_base;
-- 
2.30.2

[PULL 10/16] iotests/308: Fix for CAP_DAC_OVERRIDE