date:20220927

Re: [PATCH v2 07/11] acpi/tests/bits: add python test that exercizes QEMU bios tables using biosbits

2022-09-27 Thread Michael S. Tsirkin

On Wed, Sep 28, 2022 at 01:10:15AM +0200, Paolo Bonzini wrote:
> 
> 
> Il mer 28 set 2022, 00:09 Michael S. Tsirkin  ha scritto:
> 
> On Tue, Sep 27, 2022 at 11:44:56PM +0200, Paolo Bonzini wrote:
> > I also second the idea of using avocado instead of pytest, by the way.
> >
> > Paolo
> 
> I do not think this is a good fit for bios tests.
> bios tests are intended for a wide audience of ACPI developers
> across a variety of host systems. They basically do not need anything
> from the host and they need to be super easy to configure
> since we have lots of drive through contributors.
> 
> 
> The setup would be the same, with avocado installed in a virtual environment
> via pip. It doesn't need to be set up outside, neither with distro packages 
> nor
> in ~/.local, and especially it is not necessary to deal with avocado-vt.
> 
> Paolo

Hmm, good point.

-- 
MST

Re: [PATCH v7 2/2] i386: Add notify VM exit support

2022-09-27 Thread Paolo Bonzini

Il mer 28 set 2022, 04:21 Chenyi Qiang  ha scritto:

> >> +warn_report_once("KVM: encounter a notify exit with %svalid
> >> context in"
> >> + " guest. It means there can be possible
> >> misbehaves in"
> >> + " guest, please have a look.",
> >> + ctx_invalid ? "in" : "");
> >
> > The warning should be unconditional if the context is invalid.
> >
>
> In valid context case, the warning can also notify the admin that the
> guest misbehaves. Is it necessary to remove it?
>

You can keep it, but it should be separated so that that invalid context
case uses warn_report instead of warn_report_once.

Paolo


> >> +object_class_property_add(oc, X86_MACHINE_NOTIFY_WINDOW,
> "uint32_t",
> >
> > uint32 (not uint32_t)
> >
>
> ...
>
> >> +  x86_machine_get_notify_window,
> >> +  x86_machine_set_notify_window, NULL,
> >> NULL);
> >> +object_class_property_set_description(oc,
> X86_MACHINE_NOTIFY_WINDOW,
> >> +"Set the notify window required by notify VM exit");
> >
> > "Clock cycles without an event window after which a notification VM exit
> > occurs"
> >
>
> Will Fix it. Thanks a lot!
>
> > Thanks,
> >
> > Paolo
> >
> >  From a5cb704991cfcda19a33c622833b69a8f6928530 Mon Sep 17 00:00:00 2001
> > From: Paolo Bonzini 
> > Date: Tue, 27 Sep 2022 15:20:16 +0200
> > Subject: [PATCH] kvm: allow target-specific accelerator properties
> >
> > Several hypervisor capabilities in KVM are target-specific.  When exposed
> > to QEMU users as accelerator properties (i.e. -accel kvm,prop=value),
> they
> > should not be available for all targets.
> >
> > Add a hook for targets to add their own properties to -accel kvm; for
> > now no such property is defined.
> >
> > Signed-off-by: Paolo Bonzini 
> >
> > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> > index 5acab1767f..f90c5cb285 100644
> > --- a/accel/kvm/kvm-all.c
> > +++ b/accel/kvm/kvm-all.c
> > @@ -3737,6 +3737,8 @@ static void kvm_accel_class_init(ObjectClass *oc,
> > void *data)
> >   NULL, NULL);
> >   object_class_property_set_description(oc, "dirty-ring-size",
> >   "Size of KVM dirty page ring buffer (default: 0, i.e. use
> > bitmap)");
> > +
> > +kvm_arch_accel_class_init(oc);
> >   }
> >
> >   static const TypeInfo kvm_accel_type = {
> > diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> > index efd6dee818..50868ebf60 100644
> > --- a/include/sysemu/kvm.h
> > +++ b/include/sysemu/kvm.h
> > @@ -353,6 +353,8 @@ bool kvm_device_supported(int vmfd, uint64_t type);
> >
> >   extern const KVMCapabilityInfo kvm_arch_required_capabilities[];
> >
> > +void kvm_arch_accel_class_init(ObjectClass *oc);
> > +
> >   void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run);
> >   MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run);
> >
> > diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> > index e5c1bd50d2..d21603cf28 100644
> > --- a/target/arm/kvm.c
> > +++ b/target/arm/kvm.c
> > @@ -1056,3 +1056,7 @@ bool kvm_arch_cpu_check_are_resettable(void)
> >   {
> >   return true;
> >   }
> > +
> > +void kvm_arch_accel_class_init(ObjectClass *oc)
> > +{
> > +}
> > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > index 21880836a6..22b3b37193 100644
> > --- a/target/i386/kvm/kvm.c
> > +++ b/target/i386/kvm/kvm.c
> > @@ -5472,3 +5472,7 @@ void kvm_request_xsave_components(X86CPU *cpu,
> > uint64_t mask)
> >   mask &= ~BIT_ULL(bit);
> >   }
> >   }
> > +
> > +void kvm_arch_accel_class_init(ObjectClass *oc)
> > +{
> > +}
> > diff --git a/target/mips/kvm.c b/target/mips/kvm.c
> > index caf70decd2..bcb8e06b2c 100644
> > --- a/target/mips/kvm.c
> > +++ b/target/mips/kvm.c
> > @@ -1294,3 +1294,7 @@ bool kvm_arch_cpu_check_are_resettable(void)
> >   {
> >   return true;
> >   }
> > +
> > +void kvm_arch_accel_class_init(ObjectClass *oc)
> > +{
> > +}
> > diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> > index 466d0d2f4c..7c25348b7b 100644
> > --- a/target/ppc/kvm.c
> > +++ b/target/ppc/kvm.c
> > @@ -2966,3 +2966,7 @@ bool kvm_arch_cpu_check_are_resettable(void)
> >   {
> >   return true;
> >   }
> > +
> > +void kvm_arch_accel_class_init(ObjectClass *oc)
> > +{
> > +}
> > diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
> > index 70b4cff06f..30f21453d6 100644
> > --- a/target/riscv/kvm.c
> > +++ b/target/riscv/kvm.c
> > @@ -532,3 +532,7 @@ bool kvm_arch_cpu_check_are_resettable(void)
> >   {
> >   return true;
> >   }
> > +
> > +void kvm_arch_accel_class_init(ObjectClass *oc)
> > +{
> > +}
> > diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
> > index 7bd8db0e7b..840af34576 100644
> > --- a/target/s390x/kvm/kvm.c
> > +++ b/target/s390x/kvm/kvm.c
> > @@ -2574,3 +2574,7 @@ bool kvm_arch_cpu_check_are_resettable(void)
> >   {
> >   return true;
> >   }
> > +
> > +void kvm_arch_accel_class_init(ObjectClass *oc)
> > +{
> > +}
> >
>
>

Re: [PATCH v5 02/12] blkio: add libblkio block driver

2022-09-27 Thread Markus Armbruster

Stefan Hajnoczi  writes:

> libblkio (https://gitlab.com/libblkio/libblkio/) is a library for
> high-performance disk I/O. It currently supports io_uring,
> virtio-blk-vhost-user, and virtio-blk-vhost-vdpa with additional drivers
> under development.
>
> One of the reasons for developing libblkio is that other applications
> besides QEMU can use it. This will be particularly useful for
> virtio-blk-vhost-user which applications may wish to use for connecting
> to qemu-storage-daemon.
>
> libblkio also gives us an opportunity to develop in Rust behind a C API
> that is easy to consume from QEMU.
>
> This commit adds io_uring, virtio-blk-vhost-user, and
> virtio-blk-vhost-vdpa BlockDrivers to QEMU using libblkio. It will be
> easy to add other libblkio drivers since they will share the majority of
> code.
>
> For now I/O buffers are copied through bounce buffers if the libblkio
> driver requires it. Later commits add an optimization for
> pre-registering guest RAM to avoid bounce buffers.
>
> The syntax is:
>
>   --blockdev 
> io_uring,node-name=drive0,filename=test.img,readonly=on|off,cache.direct=on|off
>
> and:
>
>   --blockdev 
> virtio-blk-vhost-vdpa,node-name=drive0,path=/dev/vdpa...,readonly=on|off
>
> Signed-off-by: Stefan Hajnoczi 
> Acked-by: Markus Armbruster 

[...]

> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index f21fa235f2..5aed0dd436 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2951,11 +2951,16 @@
>  'file', 'snapshot-access', 'ftp', 'ftps', 'gluster',
>  {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
>  {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
> -'http', 'https', 'iscsi',
> +'http', 'https',
> +{ 'name': 'io_uring', 'if': 'CONFIG_BLKIO' },
> +'iscsi',
>  'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
>  'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
>  { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
> -'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
> +'ssh', 'throttle', 'vdi', 'vhdx',
> +{ 'name': 'virtio-blk-vhost-user', 'if': 'CONFIG_BLKIO' },
> +{ 'name': 'virtio-blk-vhost-vdpa', 'if': 'CONFIG_BLKIO' },
> +'vmdk', 'vpc', 'vvfat' ] }
>  
>  ##
>  # @BlockdevOptionsFile:
> @@ -3678,6 +3683,42 @@
>  '*debug': 'int',
>  '*logfile': 'str' } }
>  
> +##
> +# @BlockdevOptionsIoUring:
> +#
> +# Driver specific block device options for the io_uring backend.
> +#
> +# @filename: path to the image file
> +#
> +# Since: 7.2
> +##
> +{ 'struct': 'BlockdevOptionsIoUring',
> +  'data': { 'filename': 'str' } }
> +
> +##
> +# @BlockdevOptionsVirtioBlkVhostUser:
> +#
> +# Driver specific block device options for the virtio-blk-vhost-user backend.
> +#
> +# @path: path to the vhost-user UNIX domain socket.
> +#
> +# Since: 7.2
> +##
> +{ 'struct': 'BlockdevOptionsVirtioBlkVhostUser',
> +  'data': { 'path': 'str' } }
> +
> +##
> +# @BlockdevOptionsVirtioBlkVhostVdpa:
> +#
> +# Driver specific block device options for the virtio-blk-vhost-vdpa backend.
> +#
> +# @path: path to the vhost-vdpa character device.
> +#
> +# Since: 7.2
> +##
> +{ 'struct': 'BlockdevOptionsVirtioBlkVhostVdpa',
> +  'data': { 'path': 'str' } }
> +

Should these be 'if': 'CONFIG_BLKIO'?

>  ##
>  # @IscsiTransport:
>  #
> @@ -4305,6 +4346,8 @@
> 'if': 'HAVE_HOST_BLOCK_DEVICE' },
>'http':   'BlockdevOptionsCurlHttp',
>'https':  'BlockdevOptionsCurlHttps',
> +  'io_uring':   { 'type': 'BlockdevOptionsIoUring',
> +  'if': 'CONFIG_BLKIO' },
>'iscsi':  'BlockdevOptionsIscsi',
>'luks':   'BlockdevOptionsLUKS',
>'nbd':'BlockdevOptionsNbd',
> @@ -4327,6 +4370,12 @@
>'throttle':   'BlockdevOptionsThrottle',
>'vdi':'BlockdevOptionsGenericFormat',
>'vhdx':   'BlockdevOptionsGenericFormat',
> +  'virtio-blk-vhost-user':
> +{ 'type': 'BlockdevOptionsVirtioBlkVhostUser',
> +  'if': 'CONFIG_BLKIO' },
> +  'virtio-blk-vhost-vdpa':
> +{ 'type': 'BlockdevOptionsVirtioBlkVhostVdpa',
> +  'if': 'CONFIG_BLKIO' },
>'vmdk':   'BlockdevOptionsGenericCOWFormat',
>'vpc':'BlockdevOptionsGenericFormat',
>'vvfat':  'BlockdevOptionsVVFAT'

[...]

Re: Re: [PATCH v2] disas/riscv.c: rvv: Add disas support for vector instructions

2022-09-27 Thread 刘阳




 -Original Messages-
 From: "Alistair Francis" 
 Sent Time: 2022-09-27 09:57:39 (Tuesday)
 To: "Yang Liu" 
 Cc: "Palmer Dabbelt" , "Alistair Francis" 
, "Bin Meng" , "Tommy Wu" 
, "open list:RISC-V" , 
"qemu-devel@nongnu.org Developers" , wangjunqiang 
, "Wei Wu (吴伟)" , liweiwei 

 Subject: Re: [PATCH v2] disas/riscv.c: rvv: Add disas support for vector 
instructions
 
 On Fri, Sep 23, 2022 at 2:27 PM Alistair Francis  
wrote:
 
  On Fri, Aug 26, 2022 at 1:26 PM Yang Liu  
wrote:
  
   Tested with https://github.com/ksco/rvv-decoder-tests
  
   Expected checkpatch errors for consistency and brevity reasons:
  
   ERROR: line over 90 characters
   ERROR: trailing statements should be on next line
   ERROR: braces {} are necessary for all arms of this statement
  
   Signed-off-by: Yang Liu 
 
  Thanks!
 
  Applied to riscv-to-apply.next
 
 This patch fails to build with this error:
 
 ../disas/riscv.c: In function 'print_insn_riscv':
 ../disas/riscv.c:4513:30: error: '__builtin___sprintf_chk' may write a
 terminating nul past the end of the destination
 [-Werror=format-overflow=]
  4513 | sprintf(nbuf, "%d", sew);
   |  ^
 In file included from /usr/include/stdio.h:906,
  from
 
/scratch/jenkins-tmp/workspace/QEMU-Multi-Config-Build/BUILD_OPTIONS/GCC/include/qemu/osdep.h:97,
  from ../disas/riscv.c:20:
 In function 'sprintf',
 inlined from 'format_inst' at ../disas/riscv.c:4513:13,
 inlined from 'disasm_inst' at ../disas/riscv.c:4640:5,
 inlined from 'print_insn_riscv' at ../disas/riscv.c:4690:5:
 /usr/include/bits/stdio2.h:30:10: note: '__builtin___sprintf_chk'
 output between 2 and 5 bytes into a destination of size 4
30 |   return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
   |  ^~
31 |   __glibc_objsize (__s), __fmt,
   |   ~
32 |   __va_arg_pack ());
   |   ~
 
 
 Alistair
 

Thanks for the review, I've submitted a v3 patch.

Yang

decode_inst_operands(dec);
decode_inst_decompress(dec, isa);
decode_inst_lift_pseudo(dec);
   -format_inst(buf, buflen, 16, dec);
   +format_inst(buf, buflen, 24, dec);
}
  
#define INST_FMT_2 "%04" PRIx64 "  "
   --
   2.30.1 (Apple Git-130)

[PATCH v3] disas/riscv.c: rvv: Add disas support for vector instructions

2022-09-27 Thread Yang Liu

Tested with https://github.com/ksco/rvv-decoder-tests

Expected checkpatch errors for consistency and brevity reasons:

ERROR: line over 90 characters
ERROR: trailing statements should be on next line
ERROR: braces {} are necessary for all arms of this statement

Signed-off-by: Yang Liu 
---
 disas/riscv.c | 1432 -
 1 file changed, 1430 insertions(+), 2 deletions(-)

diff --git a/disas/riscv.c b/disas/riscv.c
index f107d94c4c..d216b9c39b 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -158,6 +158,11 @@ typedef enum {
 rv_codec_css_sqsp,
 rv_codec_k_bs,
 rv_codec_k_rnum,
+rv_codec_v_r,
+rv_codec_v_ldst,
+rv_codec_v_i,
+rv_codec_vsetvli,
+rv_codec_vsetivli,
 } rv_codec;
 
 typedef enum {
@@ -560,6 +565,376 @@ typedef enum {
 rv_op_zip = 396,
 rv_op_xperm4 = 397,
 rv_op_xperm8 = 398,
+rv_op_vle8_v = 399,
+rv_op_vle16_v = 400,
+rv_op_vle32_v = 401,
+rv_op_vle64_v = 402,
+rv_op_vse8_v = 403,
+rv_op_vse16_v = 404,
+rv_op_vse32_v = 405,
+rv_op_vse64_v = 406,
+rv_op_vlm_v = 407,
+rv_op_vsm_v = 408,
+rv_op_vlse8_v = 409,
+rv_op_vlse16_v = 410,
+rv_op_vlse32_v = 411,
+rv_op_vlse64_v = 412,
+rv_op_vsse8_v = 413,
+rv_op_vsse16_v = 414,
+rv_op_vsse32_v = 415,
+rv_op_vsse64_v = 416,
+rv_op_vluxei8_v = 417,
+rv_op_vluxei16_v = 418,
+rv_op_vluxei32_v = 419,
+rv_op_vluxei64_v = 420,
+rv_op_vloxei8_v = 421,
+rv_op_vloxei16_v = 422,
+rv_op_vloxei32_v = 423,
+rv_op_vloxei64_v = 424,
+rv_op_vsuxei8_v = 425,
+rv_op_vsuxei16_v = 426,
+rv_op_vsuxei32_v = 427,
+rv_op_vsuxei64_v = 428,
+rv_op_vsoxei8_v = 429,
+rv_op_vsoxei16_v = 430,
+rv_op_vsoxei32_v = 431,
+rv_op_vsoxei64_v = 432,
+rv_op_vle8ff_v = 433,
+rv_op_vle16ff_v = 434,
+rv_op_vle32ff_v = 435,
+rv_op_vle64ff_v = 436,
+rv_op_vl1re8_v = 437,
+rv_op_vl1re16_v = 438,
+rv_op_vl1re32_v = 439,
+rv_op_vl1re64_v = 440,
+rv_op_vl2re8_v = 441,
+rv_op_vl2re16_v = 442,
+rv_op_vl2re32_v = 443,
+rv_op_vl2re64_v = 444,
+rv_op_vl4re8_v = 445,
+rv_op_vl4re16_v = 446,
+rv_op_vl4re32_v = 447,
+rv_op_vl4re64_v = 448,
+rv_op_vl8re8_v = 449,
+rv_op_vl8re16_v = 450,
+rv_op_vl8re32_v = 451,
+rv_op_vl8re64_v = 452,
+rv_op_vs1r_v = 453,
+rv_op_vs2r_v = 454,
+rv_op_vs4r_v = 455,
+rv_op_vs8r_v = 456,
+rv_op_vadd_vv = 457,
+rv_op_vadd_vx = 458,
+rv_op_vadd_vi = 459,
+rv_op_vsub_vv = 460,
+rv_op_vsub_vx = 461,
+rv_op_vrsub_vx = 462,
+rv_op_vrsub_vi = 463,
+rv_op_vwaddu_vv = 464,
+rv_op_vwaddu_vx = 465,
+rv_op_vwadd_vv = 466,
+rv_op_vwadd_vx = 467,
+rv_op_vwsubu_vv = 468,
+rv_op_vwsubu_vx = 469,
+rv_op_vwsub_vv = 470,
+rv_op_vwsub_vx = 471,
+rv_op_vwaddu_wv = 472,
+rv_op_vwaddu_wx = 473,
+rv_op_vwadd_wv = 474,
+rv_op_vwadd_wx = 475,
+rv_op_vwsubu_wv = 476,
+rv_op_vwsubu_wx = 477,
+rv_op_vwsub_wv = 478,
+rv_op_vwsub_wx = 479,
+rv_op_vadc_vvm = 480,
+rv_op_vadc_vxm = 481,
+rv_op_vadc_vim = 482,
+rv_op_vmadc_vvm = 483,
+rv_op_vmadc_vxm = 484,
+rv_op_vmadc_vim = 485,
+rv_op_vsbc_vvm = 486,
+rv_op_vsbc_vxm = 487,
+rv_op_vmsbc_vvm = 488,
+rv_op_vmsbc_vxm = 489,
+rv_op_vand_vv = 490,
+rv_op_vand_vx = 491,
+rv_op_vand_vi = 492,
+rv_op_vor_vv = 493,
+rv_op_vor_vx = 494,
+rv_op_vor_vi = 495,
+rv_op_vxor_vv = 496,
+rv_op_vxor_vx = 497,
+rv_op_vxor_vi = 498,
+rv_op_vsll_vv = 499,
+rv_op_vsll_vx = 500,
+rv_op_vsll_vi = 501,
+rv_op_vsrl_vv = 502,
+rv_op_vsrl_vx = 503,
+rv_op_vsrl_vi = 504,
+rv_op_vsra_vv = 505,
+rv_op_vsra_vx = 506,
+rv_op_vsra_vi = 507,
+rv_op_vnsrl_wv = 508,
+rv_op_vnsrl_wx = 509,
+rv_op_vnsrl_wi = 510,
+rv_op_vnsra_wv = 511,
+rv_op_vnsra_wx = 512,
+rv_op_vnsra_wi = 513,
+rv_op_vmseq_vv = 514,
+rv_op_vmseq_vx = 515,
+rv_op_vmseq_vi = 516,
+rv_op_vmsne_vv = 517,
+rv_op_vmsne_vx = 518,
+rv_op_vmsne_vi = 519,
+rv_op_vmsltu_vv = 520,
+rv_op_vmsltu_vx = 521,
+rv_op_vmslt_vv = 522,
+rv_op_vmslt_vx = 523,
+rv_op_vmsleu_vv = 524,
+rv_op_vmsleu_vx = 525,
+rv_op_vmsleu_vi = 526,
+rv_op_vmsle_vv = 527,
+rv_op_vmsle_vx = 528,
+rv_op_vmsle_vi = 529,
+rv_op_vmsgtu_vx = 530,
+rv_op_vmsgtu_vi = 531,
+rv_op_vmsgt_vx = 532,
+rv_op_vmsgt_vi = 533,
+rv_op_vminu_vv = 534,
+rv_op_vminu_vx = 535,
+rv_op_vmin_vv = 536,
+rv_op_vmin_vx = 537,
+rv_op_vmaxu_vv = 538,
+rv_op_vmaxu_vx = 539,
+rv_op_vmax_vv = 540,
+rv_op_vmax_vx = 541,
+rv_op_vmul_vv = 542,
+rv_op_vmul_vx = 543,
+rv_op_vmulh_vv = 544,
+rv_op_vmulh_vx = 545,
+rv_op_vmulhu_vv = 546,
+rv_op_vmulhu_vx = 547,
+rv_op_vmulhsu_vv = 548,
+rv_op_vmulhsu_vx = 549,
+rv_op_vdivu_vv =

[PATCH v1 2/2] riscv/opentitan: connect lifecycle controller

2022-09-27 Thread Wilfred Mallawa

From: Wilfred Mallawa 

Connects the ibex lifecycle controller with opentitan,
with this change, we can now get past the lifecycle checks
in the boot rom.

Signed-off-by: Wilfred Mallawa 
---
 hw/riscv/opentitan.c | 10 --
 include/hw/riscv/opentitan.h |  2 ++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/riscv/opentitan.c b/hw/riscv/opentitan.c
index be7ff1eea0..73a5cef694 100644
--- a/hw/riscv/opentitan.c
+++ b/hw/riscv/opentitan.c
@@ -122,6 +122,8 @@ static void lowrisc_ibex_soc_init(Object *obj)
 
 object_initialize_child(obj, "timer", >timer, TYPE_IBEX_TIMER);
 
+object_initialize_child(obj, "lifetime_ctrl", >lc, TYPE_IBEX_LC_CTRL);
+
 for (int i = 0; i < OPENTITAN_NUM_SPI_HOSTS; i++) {
 object_initialize_child(obj, "spi_host[*]", >spi_host[i],
 TYPE_IBEX_SPI_HOST);
@@ -243,6 +245,12 @@ static void lowrisc_ibex_soc_realize(DeviceState *dev_soc, 
Error **errp)
 }
 }
 
+/* Life-Cycle Control */
+if (!sysbus_realize(SYS_BUS_DEVICE(>lc), errp)) {
+return;
+}
+sysbus_mmio_map(SYS_BUS_DEVICE(>lc), 0, memmap[IBEX_DEV_LC_CTRL].base);
+
 create_unimplemented_device("riscv.lowrisc.ibex.gpio",
 memmap[IBEX_DEV_GPIO].base, memmap[IBEX_DEV_GPIO].size);
 create_unimplemented_device("riscv.lowrisc.ibex.spi_device",
@@ -255,8 +263,6 @@ static void lowrisc_ibex_soc_realize(DeviceState *dev_soc, 
Error **errp)
 memmap[IBEX_DEV_SENSOR_CTRL].base, memmap[IBEX_DEV_SENSOR_CTRL].size);
 create_unimplemented_device("riscv.lowrisc.ibex.otp_ctrl",
 memmap[IBEX_DEV_OTP_CTRL].base, memmap[IBEX_DEV_OTP_CTRL].size);
-create_unimplemented_device("riscv.lowrisc.ibex.lc_ctrl",
-memmap[IBEX_DEV_LC_CTRL].base, memmap[IBEX_DEV_LC_CTRL].size);
 create_unimplemented_device("riscv.lowrisc.ibex.pwrmgr",
 memmap[IBEX_DEV_PWRMGR].base, memmap[IBEX_DEV_PWRMGR].size);
 create_unimplemented_device("riscv.lowrisc.ibex.rstmgr",
diff --git a/include/hw/riscv/opentitan.h b/include/hw/riscv/opentitan.h
index 6665cd5794..64b7f21339 100644
--- a/include/hw/riscv/opentitan.h
+++ b/include/hw/riscv/opentitan.h
@@ -24,6 +24,7 @@
 #include "hw/char/ibex_uart.h"
 #include "hw/timer/ibex_timer.h"
 #include "hw/ssi/ibex_spi_host.h"
+#include "hw/misc/ibex_lc_ctrl.h"
 #include "qom/object.h"
 
 #define TYPE_RISCV_IBEX_SOC "riscv.lowrisc.ibex.soc"
@@ -44,6 +45,7 @@ struct LowRISCIbexSoCState {
 SiFivePLICState plic;
 IbexUartState uart;
 IbexTimerState timer;
+IbexLCState lc;
 IbexSPIHostState spi_host[OPENTITAN_NUM_SPI_HOSTS];
 
 uint32_t resetvec;
-- 
2.37.3

[PATCH v1 1/2] hw/misc: add ibex lifecycle controller

2022-09-27 Thread Wilfred Mallawa

From: Wilfred Mallawa 

Device model for the OpenTitan lifecycle controller as per [1].

Addition of this model is the first of many steps to adding `boot_rom`
support for OpenTitan. The OpenTitan `boot_rom` needs to access the
lifecycle controller during the init/test sequence before it jumps to
flash. With this model, we can get past the lifecycle controller stages
in boot.

Currently the supported functionality is limited.

[1] https://docs.opentitan.org/hw/ip/lc_ctrl/doc/

Signed-off-by: Wilfred Mallawa 
---
 hw/misc/ibex_lc_ctrl.c | 287 +
 hw/misc/meson.build|   3 +
 hw/misc/trace-events   |   5 +
 include/hw/misc/ibex_lc_ctrl.h | 121 ++
 4 files changed, 416 insertions(+)
 create mode 100644 hw/misc/ibex_lc_ctrl.c
 create mode 100644 include/hw/misc/ibex_lc_ctrl.h

diff --git a/hw/misc/ibex_lc_ctrl.c b/hw/misc/ibex_lc_ctrl.c
new file mode 100644
index 00..f034a92a9c
--- /dev/null
+++ b/hw/misc/ibex_lc_ctrl.c
@@ -0,0 +1,287 @@
+/*
+ * QEMU model of the Ibex Life Cycle Controller
+ * SPEC Reference: https://docs.opentitan.org/hw/ip/lc_ctrl/doc/
+ *
+ * Copyright (C) 2022 Western Digital
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "hw/misc/ibex_lc_ctrl.h"
+#include "hw/irq.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-properties-system.h"
+#include "migration/vmstate.h"
+#include "trace.h"
+
+REG32(ALERT_TEST, 0x00)
+FIELD(ALERT_TEST, FETAL_PROG_ERR, 0, 1)
+FIELD(ALERT_TEST, FETAL_STATE_ERR, 1, 1)
+FIELD(ALERT_TEST, FETAL_BUS_INTEG_ERR, 2, 1)
+REG32(CTRL_STATUS, 0x04)
+FIELD(CTRL_STATUS, READY, 0, 1)
+FIELD(CTRL_STATUS, TRANSITION_SUCCESSFUL, 0, 1)
+FIELD(CTRL_STATUS, TRANSITION_COUNT_ERROR, 1, 1)
+FIELD(CTRL_STATUS, TRANSITION_ERROR, 2, 1)
+FIELD(CTRL_STATUS, TOKEN_ERROR, 3, 1)
+FIELD(CTRL_STATUS, FLASH_RMA_ERROR, 4, 1)
+FIELD(CTRL_STATUS, OTP_ERROR, 5, 1)
+FIELD(CTRL_STATUS, STATE_ERROR, 6, 1)
+FIELD(CTRL_STATUS, BUS_INTEG_ERROR, 7, 1)
+FIELD(CTRL_STATUS, OTP_PARTITION_ERROR, 8, 1)
+REG32(CLAIM_TRANSITION_IF, 0x08)
+ FIELD(CLAIM_TRANSITION_IF, MUTEX, 0, 8)
+REG32(TRANSITION_REGWEN , 0x0C)
+ FIELD(TRANSITION_REGWEN , TRANSITION_REGWEN, 0, 1)
+REG32(TRANSITION_CMD , 0x10)
+ FIELD(TRANSITION_CMD , START, 0, 1)
+REG32(TRANSITION_CTRL , 0x14)
+ FIELD(TRANSITION_CTRL , EXT_CLOCK_EN, 0, 1)
+REG32(TRANSITION_TOKEN_0 , 0x18)
+ FIELD(TRANSITION_TOKEN_0 , TRANSITION_TOKEN_0, 0, 32)
+REG32(TRANSITION_TOKEN_1 , 0x1C)
+ FIELD(TRANSITION_TOKEN_1 , TRANSITION_TOKEN_1, 0, 32)
+REG32(TRANSITION_TOKEN_2 , 0x20)
+ FIELD(TRANSITION_TOKEN_2 , TRANSITION_TOKEN_2, 0, 32)
+REG32(TRANSITION_TOKEN_3 , 0x24)
+ FIELD(TRANSITION_TOKEN_3 , TRANSITION_TOKEN_3, 0, 32)
+REG32(TRANSITION_TARGET , 0x28)
+ FIELD(TRANSITION_TARGET , STATE, 0, 30)
+REG32(OTP_VENDOR_TEST_CTRL , 0x2C)
+ FIELD(OTP_VENDOR_TEST_CTRL , OTP_VENDOR_TEST_CTRL, 0, 32)
+REG32(OTP_VENDOR_TEST_STATUS , 0x30)
+ FIELD(OTP_VENDOR_TEST_STATUS , OTP_VENDOR_TEST_STATUS, 0, 32)
+REG32(LC_STATE , 0x34)
+ FIELD(LC_STATE , STATE, 0, 30)
+REG32(LC_TRANSITION_CNT , 0x38)
+ FIELD(LC_TRANSITION_CNT , CNT, 0, 5)
+REG32(LC_ID_STATE , 0x3C)
+ FIELD(LC_ID_STATE , STATE, 0, 32)
+REG32(HW_REV , 0x40)
+ FIELD(HW_REV , CHIP_REV, 0, 16)
+ FIELD(HW_REV , CHIP_GEN, 16, 16)
+REG32(DEVICE_ID_0 , 0x44)
+ FIELD(DEVICE_ID_0 , DEVICE_ID_0, 0, 32)
+REG32(DEVICE_ID_1 , 0x48)
+ FIELD(DEVICE_ID_1 , DEVICE_ID_2, 0, 32)
+REG32(DEVICE_ID_2 , 0x4C)
+ FIELD(DEVICE_ID_2 , DEVICE_ID_2, 0, 32)
+REG32(DEVICE_ID_3 , 0x50)
+ FIELD(DEVICE_ID_3 , DEVICE_ID_3, 0, 32)
+REG32(DEVICE_ID_4 , 0x54)
+ FIELD(DEVICE_ID_4 , DEVICE_ID_4, 0, 32)
+REG32(DEVICE_ID_5 , 0x58)
+ FIELD(DEVICE_ID_5 , DEVICE_ID_5, 0, 32)
+REG32(DEVICE_ID_6 , 0x5C)
+ FIELD(DEVICE_ID_6 , DEVICE_ID_6, 0,

[PATCH v1 0/2] Add OpenTitan lifecycle controller

2022-09-27 Thread Wilfred Mallawa

From: Wilfred Mallawa 

This series of patches:
- Add OpenTitan lifecycle controller with basic functionality
- Connects it to OpenTitan

Currently in OpenTitan, we skip the `boot_rom` since is has become more
complex and we do not have all the support in QEMU to use it. One of the
missing pieces to getting the `boot_rom` working is the lifecycle
controller. There's a particular `magic_value` that is kept in the
`LC_STATE` register which is checked by the `boot_rom`.

With this basic implementation of the device, we can get past the
lifecycle controller in the `boot_rom` check stage. Testing was done
using TockOS (running QEMU with `-d in_asm`) to see how far in the boot
rom we get.

End goal is to add support to all device models in QEMU that are
required by the OpenTitan `boot_rom` such that we can use it as the
`bios`.

Wilfred Mallawa (2):
  hw/misc: add ibex lifecycle controller
  riscv/opentitan: connect lifecycle controller

 hw/misc/ibex_lc_ctrl.c | 287 +
 hw/misc/meson.build|   3 +
 hw/misc/trace-events   |   5 +
 hw/riscv/opentitan.c   |  10 +-
 include/hw/misc/ibex_lc_ctrl.h | 121 ++
 include/hw/riscv/opentitan.h   |   2 +
 6 files changed, 426 insertions(+), 2 deletions(-)
 create mode 100644 hw/misc/ibex_lc_ctrl.c
 create mode 100644 include/hw/misc/ibex_lc_ctrl.h

-- 
2.37.3

Re: [PATCH] target/arm: Use the max page size in a 2-stage ptw

2022-09-27 Thread Zenghui Yu via


[ Fix Marc's email address ]

On 2022/9/13 21:56, Richard Henderson wrote:

We had only been reporting the stage2 page size.  This causes
problems if stage1 is using a larger page size (16k, 2M, etc),
but stage2 is using a smaller page size, because cputlb does
not set large_page_{addr,mask} properly.

Fix by using the max of the two page sizes.

Reported-by: Marc Zyngier 
Signed-off-by: Richard Henderson 
---

Hi Mark, I think this will fix the issue that you mentioned on Monday.
It certainly appears to fit the bill vs the described symptoms.

This is based on my ptw.c rewrite, full tree at

https://gitlab.com/rth7680/qemu/-/tree/tgt-arm-rme

Based-on: 20220822152741.1617527-1-richard.hender...@linaro.org
("[PATCH v2 00/66] target/arm: Implement FEAT_HAFDBS")


r~

---
 target/arm/ptw.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index c81c51f60c..510939fc89 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -2509,7 +2509,7 @@ static bool get_phys_addr_twostage(CPUARMState *env, 
target_ulong address,
ARMMMUFaultInfo *fi)
 {
 hwaddr ipa;
-int s1_prot;
+int s1_prot, s1_lgpgsz;
 bool ret;
 bool ipa_secure;
 ARMCacheAttrs cacheattrs1;
@@ -2550,6 +2550,7 @@ static bool get_phys_addr_twostage(CPUARMState *env, 
target_ulong address,
  * prot and cacheattrs later.
  */
 s1_prot = result->f.prot;
+s1_lgpgsz = result->f.lg_page_size;
 cacheattrs1 = result->cacheattrs;
 memset(result, 0, sizeof(*result));
 
@@ -2565,6 +2566,14 @@ static bool get_phys_addr_twostage(CPUARMState *env, target_ulong address,

 return ret;
 }
 
+/*

+ * Use the maximum of the S1 & S2 page size, so that invalidation
+ * of pages > TARGET_PAGE_SIZE works correctly.
+ */
+if (result->f.lg_page_size < s1_lgpgsz) {
+result->f.lg_page_size = s1_lgpgsz;
+}
+
 /* Combine the S1 and S2 cache attributes. */
 hcr = arm_hcr_el2_eff_secstate(env, is_secure);
 if (hcr & HCR_DC) {

Re: [PATCH v2 07/11] acpi/tests/bits: add python test that exercizes QEMU bios tables using biosbits

2022-09-27 Thread Michael S. Tsirkin

On Wed, Sep 28, 2022 at 08:38:54AM +0530, Ani Sinha wrote:
> > I don't really care where we upload them but only having the
> > latest version is just going to break anything expecting
> > the old binary.
> 
> In fairness, I am not entirely certain if there is a tight coupling
> between the qemu tests and the bits binaries. I have written the test
> framework in a way such that test modifications and new tests can be
> pushed into the bits binaries and the iso gets regenerated with the
> new tests from QEMU itself before running the tests. Only when we need
> bits bugfixes or say upgrade to new acpica that we would need to
> regenerate the bits binaries.

Theoretically, that's correct. But if we did not have bugs we would
not need tests.

-- 
MST

[PATCH] virtio: del net client if net_init_tap_one failed

2022-09-27 Thread luzhipeng

From: lu zhipeng 

If the net tap initializes successful, but failed during
network card hot-plugging, the net-tap will remains,
so cleanup.

Signed-off-by: lu zhipeng 
---
 net/tap.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index b3ddfd4a74..e203d07a12 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -686,7 +686,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
 tap_set_sndbuf(s->fd, tap, );
 if (err) {
 error_propagate(errp, err);
-return;
+goto failed;
 }
 
 if (tap->has_fd || tap->has_fds) {
@@ -726,12 +726,12 @@ static void net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
 } else {
 warn_report_err(err);
 }
-return;
+goto failed;
 }
 if (!g_unix_set_fd_nonblocking(vhostfd, true, NULL)) {
 error_setg_errno(errp, errno, "%s: Can't use file descriptor 
%d",
  name, fd);
-return;
+goto failed;
 }
 } else {
 vhostfd = open("/dev/vhost-net", O_RDWR);
@@ -743,11 +743,11 @@ static void net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
 warn_report("tap: open vhost char device failed: %s",
 strerror(errno));
 }
-return;
+goto failed;
 }
 if (!g_unix_set_fd_nonblocking(vhostfd, true, NULL)) {
 error_setg_errno(errp, errno, "Failed to set FD nonblocking");
-return;
+goto failed;
 }
 }
 options.opaque = (void *)(uintptr_t)vhostfd;
@@ -760,11 +760,17 @@ static void net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
 } else {
 warn_report(VHOST_NET_INIT_FAILED);
 }
-return;
+goto failed;
 }
 } else if (vhostfdname) {
 error_setg(errp, "vhostfd(s)= is not valid without vhost");
+goto failed;
 }
+
+return;
+
+failed:
+qemu_del_net_client(>nc);
 }
 
 static int get_fds(char *str, char *fds[], int max)
-- 
2.31.1

Re: [PATCH v2 07/11] acpi/tests/bits: add python test that exercizes QEMU bios tables using biosbits

2022-09-27 Thread Ani Sinha

On Wed, Sep 28, 2022 at 2:48 AM Michael S. Tsirkin  wrote:
>
> On Tue, Sep 27, 2022 at 09:33:27AM +0100, Daniel P. Berrangé wrote:
> > On Tue, Sep 27, 2022 at 01:43:15PM +0530, Ani Sinha wrote:
> > > On Sun, Sep 18, 2022 at 1:58 AM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Fri, Sep 16, 2022 at 09:30:42PM +0530, Ani Sinha wrote:
> > > > > On Thu, Jul 28, 2022 at 12:08 AM Ani Sinha  wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, 25 Jul 2022, Ani Sinha wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Sat, 16 Jul 2022, Michael S. Tsirkin wrote:
> > > > > > >
> > > > > > > > On Sat, Jul 16, 2022 at 12:06:00PM +0530, Ani Sinha wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jul 15, 2022 at 11:20 Michael S. Tsirkin 
> > > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Jul 15, 2022 at 09:47:27AM +0530, Ani Sinha wrote:
> > > > > > > > > > > Instead of all this mess, can't we just spawn e.g. 
> > > > > > > > > "git clone --depth
> > > > > > > > > 1"?
> > > > > > > > > > > And if the directory exists I would fetch and 
> > > > > > > > > checkout.
> > > > > > > > > >
> > > > > > > > > > There are two reasons I can think of why I do not like 
> > > > > > > > > this idea:
> > > > > > > > > >
> > > > > > > > > > (a) a git clone of a whole directory would download all 
> > > > > > > > > versions of the
> > > > > > > > > > binary whereas we want only a specific version.
> > > > > > > > >
> > > > > > > > > You mention shallow clone yourself, and I used --depth 1 
> > > > > > > > > above.
> > > > > > > > >
> > > > > > > > > > Downloading a single file
> > > > > > > > > > by shallow cloning or creating a git archive is 
> > > > > > > > > overkill IMHO when a wget
> > > > > > > > > > style retrieval works just fine.
> > > > > > > > >
> > > > > > > > > However, it does not provide for versioning, tagging etc 
> > > > > > > > > so you have
> > > > > > > > > to implement your own schema.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hmm I’m not sure if we need all that. Bits has its own 
> > > > > > > > > versioning mechanism and
> > > > > > > > > I think all we need to do is maintain the same versioning 
> > > > > > > > > logic and maintain
> > > > > > > > > binaries of different  versions. Do we really need the power 
> > > > > > > > > of git/version
> > > > > > > > > control here? Dunno.
> > > > > > > >
> > > > > > > > Well we need some schema. Given we are not using official bits 
> > > > > > > > releases
> > > > > > > > I don't think we can reuse theirs.
> > > > > > >
> > > > > > > OK fine. Lets figuire out how to push bits somewhere in 
> > > > > > > git.qemu.org and
> > > > > > > the binaries in some other repo first. Everything else hinges on 
> > > > > > > that. We
> > > > > > > can fix the rest of the bits later incrementally.
> > > > > >
> > > > > > DanPB, any thoughts on putting bits on git.qemu.org or where and 
> > > > > > how to
> > > > > > keep the binaries?
> > > > >
> > > > > Can we please conclude on this?
> > > > > Peter, can you please fork the repo? I have tried many times to reach
> > > > > you on IRC but failed.
> > > >
> > > > Probably because of travel around KVM forum.
> > > >
> > > > I think given our CI is under pressure again due to gitlab free tier
> > > > limits, tying binaries to CI isn't a great idea at this stage.
> > > > Can Ani just upload binaies to qemu.org for now?
> > >
> > > I agree with Michael here. Having a full ci/cd job for this is
> > > overkill IMHO. We should create a repo just for the binaries, have a
> > > README there to explain how we generate them and check in new versions
> > > as and when needed (it won't be frequent).
> > > How about biosbits-bin repo?
> >
> > If QEMU is hosting binaries, where any part contains GPL code, then we
> > need to be providing the full and corresponding source and the build
> > scripts needed to re-create the binary. Once we have such scripts it
> > should be trivial to trigger that from a CI job. If it isn't then
> > we're doing something wrong.  The CI quota is not an issue, because
> > this is not a job that we need to run continuously. It can be triggered
> > manually as & when we decide we need to refresh the binary, so would
> > be a small one-off CI quota hit.
> >
> > Also note that gitlab is intending to start enforcing storage quota
> > on projects in the not too distant future. This makes it unappealing
> > to store binaries in git repos, unless we genuinely need the ability
> > to access historical versions of the binary. I don't believe we need
> > that for biosbits.
> >
> > The binary can be published as a CI artifact and accessed directly
> > from the latest artifact download link. This ensures we only consume
> > quota for the most recently published binary artifact. So I don't see
> > a compelling reason to upload binaries into git.
> >
> > With regards,
> > Daniel
>
> I don't

Re: [PATCH v7 2/2] i386: Add notify VM exit support

2022-09-27 Thread Chenyi Qiang





On 9/27/2022 9:43 PM, Paolo Bonzini wrote:

On 9/23/22 09:33, Chenyi Qiang wrote:

Because there are some concerns, e.g. a notify VM exit may happen with
VM_CONTEXT_INVALID set in exit qualification (no cases are anticipated
that would set this bit), which means VM context is corrupted. To avoid
the false positive and a well-behaved guest gets killed, make this
feature disabled by default. Users can enable the feature by a new
machine property:
 qemu -machine notify_vmexit=on,notify_window=0 ...


Some comments on the interface:

- the argument should be one of "run" (i.e. do nothing and continue, the
default), "internal-error" (i.e. raise a KVM internal error), "disable"
(i.e. do not enable the capability).  You can add the enum to
qapi/runstate.json and use object_class_property_add_enum to define
the QOM property.



So, IIUC, the three options of notify-vmexit would be:
1. run (enable the capability but do nothing if the vmexit happens)
2. internal-error (enable the capability and raise a KVM internal error 
if it happens)

3. disable (do not enable the capability)

For the invalid context case, exit and raise a KVM internal error 
unconditionally.



- properties should have a dash ("-") in the name, not an underscore

- the property should be added to "-accel kvm,..." (on x86 only).  See
after my signature for a preparatory patch that adds a new
kvm_arch_accel_class_init hook.

The default would be either "run" or "disable".  Honestly I think it
should be "run", otherwise there's no point in adding the feature;
if it is not enabled by default, it is very likely that no one would
use it.



Yeah, personally speaking, I also prefer to enable it by default. In 
previous KVM patch discussion, we were worried about the buggy silicon 
to cause the invalid context case, which will kill the benign VM. But 
since there is little possibility and we can't tell if it is a false 
positive when it happens. I think default to "run" is acceptable.



A new KVM exit reason KVM_EXIT_NOTIFY is defined for notify VM exit. If
it happens with VM_INVALID_CONTEXT, hypervisor exits to user space to
inform the fatal case. Then user space can inject a SHUTDOWN event to
the target vcpu. This is implemented by injecting a sythesized triple
fault event.


I don't think a triple fault is a good match for an event that "should
not happen" and is the fault of the processor rather than the guest.
This should be a KVM internal error.  The workaround is to disable the
notify vmexit.

+    warn_report_once("KVM: encounter a notify exit with %svalid 
context in"
+ " guest. It means there can be possible 
misbehaves in"

+ " guest, please have a look.",
+ ctx_invalid ? "in" : "");


The warning should be unconditional if the context is invalid.



In valid context case, the warning can also notify the admin that the 
guest misbehaves. Is it necessary to remove it?



+    object_class_property_add(oc, X86_MACHINE_NOTIFY_WINDOW, "uint32_t",


uint32 (not uint32_t)



...


+  x86_machine_get_notify_window,
+  x86_machine_set_notify_window, NULL, 
NULL);

+    object_class_property_set_description(oc, X86_MACHINE_NOTIFY_WINDOW,
+    "Set the notify window required by notify VM exit");


"Clock cycles without an event window after which a notification VM exit 
occurs"




Will Fix it. Thanks a lot!


Thanks,

Paolo

 From a5cb704991cfcda19a33c622833b69a8f6928530 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini 
Date: Tue, 27 Sep 2022 15:20:16 +0200
Subject: [PATCH] kvm: allow target-specific accelerator properties

Several hypervisor capabilities in KVM are target-specific.  When exposed
to QEMU users as accelerator properties (i.e. -accel kvm,prop=value), they
should not be available for all targets.

Add a hook for targets to add their own properties to -accel kvm; for
now no such property is defined.

Signed-off-by: Paolo Bonzini 

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 5acab1767f..f90c5cb285 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3737,6 +3737,8 @@ static void kvm_accel_class_init(ObjectClass *oc, 
void *data)

  NULL, NULL);
  object_class_property_set_description(oc, "dirty-ring-size",
  "Size of KVM dirty page ring buffer (default: 0, i.e. use 
bitmap)");

+
+    kvm_arch_accel_class_init(oc);
  }

  static const TypeInfo kvm_accel_type = {
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index efd6dee818..50868ebf60 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -353,6 +353,8 @@ bool kvm_device_supported(int vmfd, uint64_t type);

  extern const KVMCapabilityInfo kvm_arch_required_capabilities[];

+void kvm_arch_accel_class_init(ObjectClass *oc);
+
  void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run);
  MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run);

diff --git

Re: [PATCH v7 1/2] i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple fault

2022-09-27 Thread Chenyi Qiang





On 9/27/2022 9:14 PM, Paolo Bonzini wrote:

On 9/23/22 09:33, Chenyi Qiang wrote:

For the direct triple faults, i.e. hardware detected and KVM morphed
to VM-Exit, KVM will never lose them. But for triple faults sythesized
by KVM, e.g. the RSM path, if KVM exits to userspace before the request
is serviced, userspace could migrate the VM and lose the triple fault.

A new flag KVM_VCPUEVENT_VALID_TRIPLE_FAULT is defined to signal that
the event.triple_fault_pending field contains a valid state if the
KVM_CAP_X86_TRIPLE_FAULT_EVENT capability is enabled.


Note that you are not transmitting the field on migration.  You need
this on top:

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index b97d182e28..d4124973ce 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1739,7 +1739,7 @@ typedef struct CPUArchState {
  uint8_t has_error_code;
  uint8_t exception_has_payload;
  uint64_t exception_payload;
-    bool triple_fault_pending;
+    uint8_t triple_fault_pending;
  uint32_t ins_len;
  uint32_t sipi_vector;
  bool tsc_valid;
diff --git a/target/i386/machine.c b/target/i386/machine.c
index cecd476e98..310b125235 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -1562,6 +1562,25 @@ static const VMStateDescription vmstate_arch_lbr = {
  }
  };

+static bool triple_fault_needed(void *opaque)
+{
+    X86CPU *cpu = opaque;
+    CPUX86State *env = >env;
+
+    return env->triple_fault_pending;
+}
+
+static const VMStateDescription vmstate_triple_fault = {
+    .name = "cpu/triple_fault",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = triple_fault_needed,
+    .fields = (VMStateField[]) {
+    VMSTATE_UINT8(env.triple_fault_pending, X86CPU),
+    VMSTATE_END_OF_LIST()
+    }
+};
+
  const VMStateDescription vmstate_x86_cpu = {
  .name = "cpu",
  .version_id = 12,
@@ -1706,6 +1725,7 @@ const VMStateDescription vmstate_x86_cpu = {
  _amx_xtile,
  #endif
  _arch_lbr,
+    _triple_fault,
  NULL
  }
  };


Thanks Paolo for your review!

I'll add this in next version.

Re: [PATCH v2 1/3] hw/watchdog: wdt_ibex_aon.c: Implement the watchdog for the OpenTitan

2022-09-27 Thread Tyler Ng

Hi Eddie,


On Tue, Sep 27, 2022 at 3:04 PM Dong, Eddie  wrote:

> Hi Tyler:
>
> > +}
> > +
> > +/* Called when the bark timer expires */ static void
> > +ibex_aon_barker_expired(void *opaque) {
> This may happen during ibex_aon_update_count(), right?
>
> > +IbexAONTimerState *s = IBEX_AON_TIMER(opaque);
> > +if (ibex_aon_update_count(s) &&
> This may happen during ibex_aon_update_count().
> Nested call may lead to incorrect s->regs[R_WDOG_COUNT] &
> s->wdog_last.
>
> Can you elaborate? The timers for bark and bite are not updated in
> "update_count".
>
> When 1st ibex_aon_update_count() is executing, and is in the place PPP
> (after updating s->regs[R_WDOG_COUNT] but before updating s->wdog_last), a
> timer (barker) may happen.
> Inside ibex_aon_barker_expired(), it calls ibex_aon_update_count() again
> (nest call), and update s->regs[R_WDOG_COUNT] & s->wdog_last, with the new
> value.
> After the nest execution ends, and returns to the initial point (PPP) ,
> the s->wdog_last is updated (with the value of 1st execution time), this
> leads to mismatch of s->regs[R_WDOG_COUNT] & s->wdog_last.
>
> This case may not be triggered at normal case, but if the guest read
> A_WDOG_COUNT, the 1st ibex_aon_update_count() does execute, and bring the
> potential issue.
>

I see, I wasn't aware that the virtual clock continued running while the
device address was being read.


> I think we can solve the problem, by not updating s->regs[R_WDOG_COUNT] &
> s->wdog_last in the timer call back API.  The update is not necessary given
> that the stored value is anyway not the current COUNT. We only need to
> update when the guest write the COUNT.
>
>
My initial concern about this is that the HW does the comparison check to
determine a bark/bite occurred, which is why I think the count should be
updated on a timer expiration,


>
> > +s->regs[R_WDOG_COUNT] >= s->regs[R_WDOG_BARK_THOLD]) {
> > +s->regs[R_INTR_STATE] |= (1 << 1);
> > +qemu_irq_raise(s->bark_irq);
> > +}
> > +}
> > +
>
>
> THX  Eddie
>
Thanks,
-Tyler

Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd

2022-09-27 Thread Sean Christopherson

On Mon, Sep 26, 2022, David Hildenbrand wrote:
> On 26.09.22 16:48, Kirill A. Shutemov wrote:
> > On Mon, Sep 26, 2022 at 12:35:34PM +0200, David Hildenbrand wrote:
> > > When using DAX, what happens with the shared <->private conversion? Which
> > > "type" is supposed to use dax, which not?
> > > 
> > > In other word, I'm missing too many details on the bigger picture of how
> > > this would work at all to see why it makes sense right now to prepare for
> > > that.
> > 
> > IIUC, KVM doesn't really care about pages or folios. They need PFN to
> > populate SEPT. Returning page/folio would make KVM do additional steps to
> > extract PFN and one more place to have a bug.
> 
> Fair enough. Smells KVM specific, though.

TL;DR: I'm good with either approach, though providing a "struct page" might 
avoid
   refactoring the API in the nearish future.

Playing devil's advocate for a second, the counter argument is that KVM is the
only user for the foreseeable future.

That said, it might make sense to return a "struct page" from the core API and
force KVM to do page_to_pfn().  KVM already does that for HVA-based memory, so
it's not exactly new code.

More importantly, KVM may actually need/want the "struct page" in the 
not-too-distant
future to support mapping non-refcounted "struct page" memory into the guest.  
The
ChromeOS folks have a use case involving virtio-gpu blobs where KVM can get 
handed a
"struct page" that _isn't_ refcounted[*].  Once the lack of mmu_notifier 
integration
is fixed, the remaining issue is that KVM doesn't currently have a way to 
determine
whether or not it holds a reference to the page.  Instead, KVM assumes that if 
the
page is "normal", it's refcounted, e.g. see kvm_release_pfn_clean().

KVM's current workaround for this is to refuse to map these pages into the 
guest,
i.e. KVM simply forces its assumption that normal pages are refcounted to be 
true.
To remove that workaround, the likely solution will be to pass around a tuple of
page+pfn, where "page" is non-NULL if the pfn is a refcounted "struct page".

At that point, getting handed a "struct page" from the core API would be a good
thing as KVM wouldn't need to probe the PFN to determine whether or not it's a
refcounted page.

Note, I still want the order to be provided by the API so that KVM doesn't need
to run through a bunch of helpers to try and figure out the allowed mapping 
size.

[*] 
https://lore.kernel.org/all/CAD=HUj736L5oxkzeL2JoPV8g1S6Rugy_TquW=prt73ymfzp...@mail.gmail.com

Re: [PATCH][RESEND] hyperv: fix SynIC SINT assertion failure on guest reset

2022-09-27 Thread Paolo Bonzini

Why does this need to be a virtual function, if it is the same for all CPUs
(it differs between system and user-mode emulation, but it is never called
by user-mode emulation so that does not matter)?

Paolo

Il mar 27 set 2022, 17:12 Maciej S. Szmigiero 
ha scritto:

> From: "Maciej S. Szmigiero" 
>
> Resetting a guest that has Hyper-V VMBus support enabled triggers a QEMU
> assertion failure:
> hw/hyperv/hyperv.c:131: synic_reset: Assertion
> `QLIST_EMPTY(>sint_routes)' failed.
>
> This happens both on normal guest reboot or when using "system_reset" HMP
> command.
>
> The failing assertion was introduced by commit 64ddecc88bcf ("hyperv:
> SControl is optional to enable SynIc")
> to catch dangling SINT routes on SynIC reset.
>
> The root cause of this problem is that the SynIC itself is reset before
> devices using SINT routes have chance to clean up these routes.
>
> Since there seems to be no existing mechanism to force reset callbacks (or
> methods) to be executed in specific order let's use a similar method that
> is already used to reset another interrupt controller (APIC) after devices
> have been reset - by invoking the SynIC reset from the machine reset
> handler via a new "after_reset" X86 CPU method.
>
> Fixes: 64ddecc88bcf ("hyperv: SControl is optional to enable SynIc") #
> exposed the bug
> Signed-off-by: Maciej S. Szmigiero 
> ---
>  hw/i386/pc.c   |  6 ++
>  target/i386/cpu-qom.h  |  2 ++
>  target/i386/cpu.c  | 10 ++
>  target/i386/kvm/hyperv.c   |  4 
>  target/i386/kvm/kvm.c  | 24 +---
>  target/i386/kvm/kvm_i386.h |  1 +
>  6 files changed, 40 insertions(+), 7 deletions(-)
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 566accf7e6..e44f11efb3 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1850,6 +1850,7 @@ static void pc_machine_reset(MachineState *machine)
>  {
>  CPUState *cs;
>  X86CPU *cpu;
> +const X86CPUClass *cpuc;
>
>  qemu_devices_reset();
>
> @@ -1858,6 +1859,11 @@ static void pc_machine_reset(MachineState *machine)
>   */
>  CPU_FOREACH(cs) {
>  cpu = X86_CPU(cs);
> +cpuc = X86_CPU_GET_CLASS(cpu);
> +
> +if (cpuc->after_reset) {
> +cpuc->after_reset(cpu);
> +}
>
>  if (cpu->apic_state) {
>  device_legacy_reset(cpu->apic_state);
> diff --git a/target/i386/cpu-qom.h b/target/i386/cpu-qom.h
> index c557a522e1..339d23006a 100644
> --- a/target/i386/cpu-qom.h
> +++ b/target/i386/cpu-qom.h
> @@ -43,6 +43,7 @@ typedef struct X86CPUModel X86CPUModel;
>   * @static_model: See CpuDefinitionInfo::static
>   * @parent_realize: The parent class' realize handler.
>   * @parent_reset: The parent class' reset handler.
> + * @after_reset: Reset handler to be called only after all other devices
> have been reset.
>   *
>   * An x86 CPU model or family.
>   */
> @@ -68,6 +69,7 @@ struct X86CPUClass {
>  DeviceRealize parent_realize;
>  DeviceUnrealize parent_unrealize;
>  DeviceReset parent_reset;
> +void (*after_reset)(X86CPU *cpu);
>  };
>
>
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 1db1278a59..c908b944bd 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6034,6 +6034,15 @@ static void x86_cpu_reset(DeviceState *dev)
>  #endif
>  }
>
> +static void x86_cpu_after_reset(X86CPU *cpu)
> +{
> +#ifndef CONFIG_USER_ONLY
> +if (kvm_enabled()) {
> +kvm_arch_after_reset_vcpu(cpu);
> +}
> +#endif
> +}
> +
>  static void mce_init(X86CPU *cpu)
>  {
>  CPUX86State *cenv = >env;
> @@ -7099,6 +7108,7 @@ static void x86_cpu_common_class_init(ObjectClass
> *oc, void *data)
>  device_class_set_props(dc, x86_cpu_properties);
>
>  device_class_set_parent_reset(dc, x86_cpu_reset, >parent_reset);
> +xcc->after_reset = x86_cpu_after_reset;
>  cc->reset_dump_flags = CPU_DUMP_FPU | CPU_DUMP_CCOP;
>
>  cc->class_by_name = x86_cpu_class_by_name;
> diff --git a/target/i386/kvm/hyperv.c b/target/i386/kvm/hyperv.c
> index 9026ef3a81..e3ac978648 100644
> --- a/target/i386/kvm/hyperv.c
> +++ b/target/i386/kvm/hyperv.c
> @@ -23,6 +23,10 @@ int hyperv_x86_synic_add(X86CPU *cpu)
>  return 0;
>  }
>
> +/*
> + * All devices possibly using SynIC have to be reset before calling this
> to let
> + * them remove their SINT routes first.
> + */
>  void hyperv_x86_synic_reset(X86CPU *cpu)
>  {
>  hyperv_synic_reset(CPU(cpu));
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index a1fd1f5379..774484c588 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -2203,20 +2203,30 @@ void kvm_arch_reset_vcpu(X86CPU *cpu)
>  env->mp_state = KVM_MP_STATE_RUNNABLE;
>  }
>
> +/* enabled by default */
> +env->poll_control_msr = 1;
> +
> +kvm_init_nested_state(env);
> +
> +sev_es_set_reset_vector(CPU(cpu));
> +}
> +
> +void kvm_arch_after_reset_vcpu(X86CPU *cpu)
> +{
> +CPUX86State *env = >env;
> +int i;
> +
> +/*
> +

Re: [RFC PATCH] tests/qtest: bump up QOS_PATH_MAX_ELEMENT_SIZE

2022-09-27 Thread Paolo Bonzini

What is an example of one such huge path? This would mean that LTO is
changing the set of tests that are run, which is unexpected.

Paolo

Il mar 27 set 2022, 23:35 Alex Bennée  ha scritto:

> It seems the depth of path we need to support can vary depending on
> the order of the init constructors getting called. It seems
> --enable-lto shuffles things around just enough to push you over the
> limit.
>
> Signed-off-by: Alex Bennée 
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1186
> ---
>  tests/qtest/libqos/qgraph.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/qtest/libqos/qgraph.h b/tests/qtest/libqos/qgraph.h
> index 6e94824d09..5c0046e989 100644
> --- a/tests/qtest/libqos/qgraph.h
> +++ b/tests/qtest/libqos/qgraph.h
> @@ -24,7 +24,7 @@
>  #include "libqos-malloc.h"
>
>  /* maximum path length */
> -#define QOS_PATH_MAX_ELEMENT_SIZE 50
> +#define QOS_PATH_MAX_ELEMENT_SIZE 64
>
>  typedef struct QOSGraphObject QOSGraphObject;
>  typedef struct QOSGraphNode QOSGraphNode;
> --
> 2.34.1
>
>

Re: [PATCH v2 07/11] acpi/tests/bits: add python test that exercizes QEMU bios tables using biosbits

2022-09-27 Thread Paolo Bonzini

Il mer 28 set 2022, 00:09 Michael S. Tsirkin  ha scritto:

> On Tue, Sep 27, 2022 at 11:44:56PM +0200, Paolo Bonzini wrote:
> > I also second the idea of using avocado instead of pytest, by the way.
> >
> > Paolo
>
> I do not think this is a good fit for bios tests.
> bios tests are intended for a wide audience of ACPI developers
> across a variety of host systems. They basically do not need anything
> from the host and they need to be super easy to configure
> since we have lots of drive through contributors.
>

The setup would be the same, with avocado installed in a virtual
environment via pip. It doesn't need to be set up outside, neither with
distro packages nor in ~/.local, and especially it is not necessary to deal
with avocado-vt.

Paolo

Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd

2022-09-27 Thread Sean Christopherson

On Mon, Sep 26, 2022, Fuad Tabba wrote:
> Hi,
> 
> On Mon, Sep 26, 2022 at 3:28 PM Chao Peng  wrote:
> >
> > On Fri, Sep 23, 2022 at 04:19:46PM +0100, Fuad Tabba wrote:
> > > > Then on the KVM side, its mmap_start() + mmap_end() sequence would:
> > > >
> > > >   1. Not be supported for TDX or SEV-SNP because they don't allow 
> > > > adding non-zero
> > > >  memory into the guest (after pre-boot phase).
> > > >
> > > >   2. Be mutually exclusive with shared<=>private conversions, and is 
> > > > allowed if
> > > >  and only if the entire gfn range of the associated memslot is 
> > > > shared.
> > >
> > > In general I think that this would work with pKVM. However, limiting
> > > private<->shared conversions to the granularity of a whole memslot
> > > might be difficult to handle in pKVM, since the guest doesn't have the
> > > concept of memslots. For example, in pKVM right now, when a guest
> > > shares back its restricted DMA pool with the host it does so at the
> > > page-level.

Y'all are killing me :-)

Isn't the guest enlightened?  E.g. can't you tell the guest "thou shalt share at
granularity X"?  With KVM's newfangled scalable memslots and per-vCPU MRU slot,
X doesn't even have to be that high to get reasonable performance, e.g. assuming
the DMA pool is at most 2GiB, that's "only" 1024 memslots, which is supposed to
work just fine in KVM.

> > > pKVM would also need a way to make an fd accessible again
> > > when shared back, which I think isn't possible with this patch.
> >
> > But does pKVM really want to mmap/munmap a new region at the page-level,
> > that can cause VMA fragmentation if the conversion is frequent as I see.
> > Even with a KVM ioctl for mapping as mentioned below, I think there will
> > be the same issue.
> 
> pKVM doesn't really need to unmap the memory. What is really important
> is that the memory is not GUP'able.

Well, not entirely unguppable, just unguppable without a magic FOLL_* flag,
otherwise KVM wouldn't be able to get the PFN to map into guest memory.

The problem is that gup() and "mapped" are tied together.  So yes, pKVM doesn't
strictly need to unmap memory _in the untrusted host_, but since 
mapped==guppable,
the end result is the same.

Emphasis above because pKVM still needs unmap the memory _somehwere_.  IIUC, the
current approach is to do that only in the stage-2 page tables, i.e. only in the
context of the hypervisor.  Which is also the source of the gup() problems; the
untrusted kernel is blissfully unaware that the memory is inaccessible.

Any approach that moves some of that information into the untrusted kernel so 
that
the kernel can protect itself will incur fragmentation in the VMAs.  Well, 
unless
all of guest memory becomes unguppable, but that's likely not a viable option.

[PULL 2/2] vfio/common: Fix vfio_iommu_type1_info use after free

2022-09-27 Thread Alex Williamson

On error, vfio_get_iommu_info() frees and clears *info, but
vfio_connect_container() continues to use the pointer regardless
of the return value.  Restructure the code such that a failure
of this function triggers an error and clean up the remainder of
the function, including updating an outdated comment that had
drifted from its relevant line of code and using host page size
for a default for better compatibility on non-4KB systems.

Reported-by: Nicolin Chen 
Link: https://lore.kernel.org/all/20220910004245.2878-1-nicol...@nvidia.com/
Signed-off-by: Alex Williamson 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Nicolin Chen 
Tested-by: Nicolin Chen 
Link: 
https://lore.kernel.org/r/166326219630.3388898.12882473157184946072.stgit@omen
Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c |   36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ace9562a9ba1..6b5d8c0bf694 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -2111,29 +2111,31 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 {
 struct vfio_iommu_type1_info *info;
 
-/*
- * FIXME: This assumes that a Type1 IOMMU can map any 64-bit
- * IOVA whatsoever.  That's not actually true, but the current
- * kernel interface doesn't tell us what it can map, and the
- * existing Type1 IOMMUs generally support any IOVA we're
- * going to actually try in practice.
- */
 ret = vfio_get_iommu_info(container, );
+if (ret) {
+error_setg_errno(errp, -ret, "Failed to get VFIO IOMMU info");
+goto enable_discards_exit;
+}
 
-if (ret || !(info->flags & VFIO_IOMMU_INFO_PGSIZES)) {
-/* Assume 4k IOVA page size */
-info->iova_pgsizes = 4096;
+if (info->flags & VFIO_IOMMU_INFO_PGSIZES) {
+container->pgsizes = info->iova_pgsizes;
+} else {
+container->pgsizes = qemu_real_host_page_size();
 }
-vfio_host_win_add(container, 0, (hwaddr)-1, info->iova_pgsizes);
-container->pgsizes = info->iova_pgsizes;
 
-/* The default in the kernel ("dma_entry_limit") is 65535. */
-container->dma_max_mappings = 65535;
-if (!ret) {
-vfio_get_info_dma_avail(info, >dma_max_mappings);
-vfio_get_iommu_info_migration(container, info);
+if (!vfio_get_info_dma_avail(info, >dma_max_mappings)) {
+container->dma_max_mappings = 65535;
 }
+vfio_get_iommu_info_migration(container, info);
 g_free(info);
+
+/*
+ * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
+ * information to get the actual window extent rather than assume
+ * a 64-bit IOVA address space.
+ */
+vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
+
 break;
 }
 case VFIO_SPAPR_TCE_v2_IOMMU:

[PULL 1/2] vfio/migration: Fix incorrect initialization value for parameters in VFIOMigration

2022-09-27 Thread Alex Williamson

From: Kunkun Jiang 

The structure VFIOMigration of a VFIODevice is allocated and initialized
in vfio_migration_init(). "device_state" and "vm_running" are initialized
to 0, indicating that VFIO device is_STOP and VM is not-running. The
initialization value is incorrect. According to the agreement, default
state of VFIO device is _RUNNING. And if a VFIO device is hot-plugged
while the VM is running, "vm_running" should be 1. This patch fixes it.

Fixes: 02a7e71b1e5b ("vfio: Add VM state change handler to know state of VM")
Signed-off-by: Kunkun Jiang 
Link: https://lore.kernel.org/r/20220711014651.1327-1-jiangkun...@huawei.com
Signed-off-by: Alex Williamson 
---
 hw/vfio/migration.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index a6ad1f894561..3de4252111ee 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -806,6 +806,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 }
 
 vbasedev->migration = g_new0(VFIOMigration, 1);
+vbasedev->migration->device_state = VFIO_DEVICE_STATE_RUNNING;
+vbasedev->migration->vm_running = runstate_is_running();
 
 ret = vfio_region_setup(obj, vbasedev, >migration->region,
 info->index, "migration");

[PULL 0/2] VFIO updates 2022-09-27

2022-09-27 Thread Alex Williamson

The following changes since commit dbc4f48b5ab3e6d85f78aa4df6bd6ad561c3d152:

  Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging 
(2022-09-27 11:08:36 -0400)

are available in the Git repository at:

  https://gitlab.com/alex.williamson/qemu.git tags/vfio-updates-20220927.1

for you to fetch changes up to 85b6d2b5fc25c9c0d10d493b3728183ab8f8e68a:

  vfio/common: Fix vfio_iommu_type1_info use after free (2022-09-27 14:26:42 
-0600)


VFIO updates 2022-09-27

 * Fix initial values for migration state (Kunkun Jiang)

 * Fix a use-after-free error path (Alex Williamson)


Alex Williamson (1):
  vfio/common: Fix vfio_iommu_type1_info use after free

Kunkun Jiang (1):
  vfio/migration: Fix incorrect initialization value for parameters in 
VFIOMigration

 hw/vfio/common.c| 36 +++-
 hw/vfio/migration.c |  2 ++
 2 files changed, 21 insertions(+), 17 deletions(-)

Re: [RFC PATCH v2 13/29] target/ppc: remove unused interrupts from p8_pending_interrupt

2022-09-27 Thread Fabiano Rosas

Matheus Ferst  writes:

> Remove the following unused interrupts from the POWER8 interrupt masking
> method:
> - PPC_INTERRUPT_RESET: only raised for 6xx, 7xx, 970, and POWER5p;
> - Debug Interrupt: removed in Power ISA v2.07;
> - Hypervisor Virtualization: introduced in Power ISA v3.0;
> - Critical Input, Watchdog Timer, and Fixed Interval Timer: only defined
>   for embedded CPUs;
> - Hypervisor Doorbell, Doorbell, and Critical Doorbell: processor does

We still need the first two.
0xe80 - Directed hypervisor doorbell
0xa00 - Directed privileged doorbell

>   not implement the "Embedded.Processor Control" category;
> - Programmable Interval Timer: 40x-only;
> - PPC_INTERRUPT_THERM: only raised for 970 and POWER5p;
>

Re: [PATCH v2 07/11] acpi/tests/bits: add python test that exercizes QEMU bios tables using biosbits

2022-09-27 Thread Michael S. Tsirkin

On Tue, Sep 27, 2022 at 11:44:56PM +0200, Paolo Bonzini wrote:
> I also second the idea of using avocado instead of pytest, by the way.
> 
> Paolo

I do not think this is a good fit for bios tests.
bios tests are intended for a wide audience of ACPI developers
across a variety of host systems. They basically do not need anything
from the host and they need to be super easy to configure
since we have lots of drive through contributors.

Problem is I don't think avocado is yet at the level where I can
ask random developers to use it to check their ACPI patches.

I just went ahead and rechecked and the situation isn't much better
yet. I think the focus of avocado is system testing of full guests with
KVM, not unit testing of ACPI.

Let's start with installation on a clean box:

following
https://avocado-framework.readthedocs.io/en/98.0/guides/user/chapters/installing.html

Ugh pip, will install a bunch of stuff in ~/.local and ask me to tweak
PATH ... and what about security? No thanks!

So ...
do I want LTS or latest? Well I donnu  let's try LTS?

$ dnf module enable avocado:82lts
[sudo] password for mst: 
Last metadata expiration check: 6 days, 15:20:21 ago on Wed 21 Sep 2022 
02:33:31 AM EDT.
Dependencies resolved.
==
 Package  ArchitectureVersion   
   RepositorySize
==
Enabling module streams:
 avocado  82lts 

Transaction Summary
==

Is this ok [y/N]: y
Complete!
[mst@tuck linux]$  dnf module install avocado
Last metadata expiration check: 6 days, 15:20:41 ago on Wed 21 Sep 2022 
02:33:31 AM EDT.
No default profiles for module avocado:82lts. Available profiles: default, 
minimal
Error: Problems in request:
broken groups or modules: avocado

Ugh I guess latest then?

[mst@tuck linux]$ dnf module enable avocado:latest
Last metadata expiration check: 6 days, 15:25:21 ago on Wed 21 Sep 2022 
02:33:31 AM EDT.
Dependencies resolved.
The operation would result in switching of module 'avocado' stream '82lts' to 
stream 'latest'
Error: It is not possible to switch enabled streams of a module unless 
explicitly enabled via configuration option module_stream_switch.
It is recommended to rather remove all installed content from the module, and 
reset the module using 'dnf module reset ' command. After you 
reset the module, you can install the other stream.

Scary ... I don't really know what are streams and I am guessing module
is avocado here? and what will this affect. Oh well, I'll risk this:

[mst@tuck linux]$ dnf module reset  avocado
Last metadata expiration check: 6 days, 15:25:46 ago on Wed 21 Sep 2022 
02:33:31 AM EDT.
Dependencies resolved.
==
 Package  ArchitectureVersion   
   RepositorySize
==
Resetting modules:
 avocado

Transaction Summary
==

Is this ok [y/N]: y
Complete!
[mst@tuck linux]$ dnf module enable avocado:latest
Last metadata expiration check: 6 days, 15:25:55 ago on Wed 21 Sep 2022 
02:33:31 AM EDT.
Dependencies resolved.
==
 Package  ArchitectureVersion   
   RepositorySize
==
Enabling module streams:
 avocado  latest

Transaction Summary
==

Is this ok [y/N]: y
Complete!
[mst@tuck linux]$  dnf module install avocado
Last metadata expiration check:

Re: [PATCH v2] hw/virtio/vhost-shadow-virtqueue: Silence GCC error "maybe-uninitialized"

2022-09-27 Thread B




Am 20. September 2022 05:29:25 UTC schrieb Bernhard Beschow :
>Am 10. September 2022 15:11:17 UTC schrieb Bernhard Beschow 
>:
>>GCC issues a false positive warning, resulting in build failure with -Werror:
>>
>>  In file included from /usr/include/glib-2.0/glib.h:114,
>>   from src/include/glib-compat.h:32,
>>   from src/include/qemu/osdep.h:144,
>>   from ../src/hw/virtio/vhost-shadow-virtqueue.c:10:
>>  In function ‘g_autoptr_cleanup_generic_gfree’,
>>  inlined from ‘vhost_handle_guest_kick’ at 
>> ../src/hw/virtio/vhost-shadow-virtqueue.c:292:42:
>>  /usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: error: ‘elem’ may be 
>> used uninitialized [-Werror=maybe-uninitialized]
>> 28 |   g_free (*pp);
>>|   ^~~~
>>  ../src/hw/virtio/vhost-shadow-virtqueue.c: In function 
>> ‘vhost_handle_guest_kick’:
>>  ../src/hw/virtio/vhost-shadow-virtqueue.c:292:42: note: ‘elem’ was declared 
>> here
>>292 | g_autofree VirtQueueElement *elem;
>>|  ^~~~
>>  cc1: all warnings being treated as errors
>>
>>There is actually no problem since "elem" is initialized in both branches.
>>Silence the warning by initializig it with "NULL".
>>
>>$ gcc --version
>>gcc (GCC) 12.2.0
>>
>>Fixes: 9c2ab2f1ec333be8614cc12272d4b91960704dbe ("vhost: stop transfer elem 
>>ownership in vhost_handle_guest_kick")
>>Signed-off-by: Bernhard Beschow 
>>---
>
>Ping

Ping2

>
>> hw/virtio/vhost-shadow-virtqueue.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>>diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
>>b/hw/virtio/vhost-shadow-virtqueue.c
>>index e8e5bbc368..596d4434d2 100644
>>--- a/hw/virtio/vhost-shadow-virtqueue.c
>>+++ b/hw/virtio/vhost-shadow-virtqueue.c
>>@@ -289,7 +289,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue 
>>*svq)
>> virtio_queue_set_notification(svq->vq, false);
>> 
>> while (true) {
>>-g_autofree VirtQueueElement *elem;
>>+g_autofree VirtQueueElement *elem = NULL;
>> int r;
>> 
>> if (svq->next_guest_avail_elem) {
>

RE: [PATCH v2 1/3] hw/watchdog: wdt_ibex_aon.c: Implement the watchdog for the OpenTitan

2022-09-27 Thread Dong, Eddie

Hi Tyler:

> +}
> +
> +/* Called when the bark timer expires */ static void
> +ibex_aon_barker_expired(void *opaque) {
This may happen during ibex_aon_update_count(), right? 

> +    IbexAONTimerState *s = IBEX_AON_TIMER(opaque);
> +    if (ibex_aon_update_count(s) &&
This may happen during ibex_aon_update_count().
Nested call may lead to incorrect s->regs[R_WDOG_COUNT] & s->wdog_last. 

Can you elaborate? The timers for bark and bite are not updated in 
"update_count".

When 1st ibex_aon_update_count() is executing, and is in the place PPP (after 
updating s->regs[R_WDOG_COUNT] but before updating s->wdog_last), a timer 
(barker) may happen.
Inside ibex_aon_barker_expired(), it calls ibex_aon_update_count() again (nest 
call), and update s->regs[R_WDOG_COUNT] & s->wdog_last, with the new value.
After the nest execution ends, and returns to the initial point (PPP) , the 
s->wdog_last is updated (with the value of 1st execution time), this leads to 
mismatch of s->regs[R_WDOG_COUNT] & s->wdog_last.

This case may not be triggered at normal case, but if the guest read 
A_WDOG_COUNT, the 1st ibex_aon_update_count() does execute, and bring the 
potential issue.

I think we can solve the problem, by not updating s->regs[R_WDOG_COUNT] & 
s->wdog_last in the timer call back API.  The update is not necessary given 
that the stored value is anyway not the current COUNT. We only need to update 
when the guest write the COUNT.


> +        s->regs[R_WDOG_COUNT] >= s->regs[R_WDOG_BARK_THOLD]) {
> +        s->regs[R_INTR_STATE] |= (1 << 1);
> +        qemu_irq_raise(s->bark_irq);
> +    }
> +}
> +


THX  Eddie

Re: qemu and -vga vs. -device

2022-09-27 Thread Adam Williamson

On Tue, 2022-09-27 at 13:34 -0300, Daniel Henrique Barboza wrote:
> Hi Adam,
> 
> On 9/26/22 06:26, Gerd Hoffmann wrote:
> > On Sat, Sep 24, 2022 at 12:12:45AM -0700, Adam Williamson wrote:
> > > On Mon, 2022-09-19 at 06:42 +0200, Gerd Hoffmann wrote:
> > > > On Fri, Sep 16, 2022 at 10:02:17AM -0700, Adam Williamson wrote:
> > > > > Hi Gerd!
> > > > > 
> > > > > I'm working on a patch to revise how openQA sets video devices in 
> > > > > qemu.
> > > > > In that context, a question: if we always want to specify a single
> > > > > video device with `-device` (e.g. `-device VGA` or `-device virtio-
> > > > > vga`), should we also specify `-vga none` to ensure qemu doesn't also
> > > > > include another adapter as a default for the -vga arg?
> > > > 
> > > > Doesn't hurt to include it.  In theory it should not be needed, qemu has
> > > > a list of vga devices and in case '-device $vga' is found on the command
> > > > line will turn off the default vga device automatically.  In practice
> > > > there are qemu versions where this list is not complete, so it
> > > > sometimes doesn't work as intended.
> > > > 
> > > > Alternatively use '-nodefaults' which will disable all automatically
> > > > added devices (vga, nic, cdrom, ...).
> > > 
> > > Thanks Gerd!
> > > 
> > > So, I got around to testing this today, and found something
> > > interesting. On ppc64le, adding `-vga none` seems to break things.
> > > Booting a Fedora installer ISO, which should show the boot menu with a
> > > 60 second timeout then boot to the installer, if we run the VM with `-
> > > vga std`, we see the bootloader. If we run it with `-device VGA` and no
> > > `-vga` arg, we see the bootloader. But if we run qemu with `-vga none -
> > > device VGA`, we don't see the bootloader. The system just sits at the
> > > OFW init screen apparently forever (I thought it might actually be
> > > running in the background and recover to anaconda after the 60 second
> > > boot timeout, but it doesn't seem to).
> > > 
> > > Not sure what's going on there, but thought you might be interested.
> 
> Can you please send the full command line you're using?

Hi Daniel! Here it is:

/usr/bin/qemu-system-ppc64 -vga none -device VGA,edid=on,xres=1024,yres=768 -g 
1024x768 -only-migratable -chardev 
ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 
-audiodev none,id=snd0 -device intel-hda -device hda-output,audiodev=snd0 
-global isa-fdc.fdtypeA=none -m 4096 -machine 
usb=off,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off -cpu 
host -netdev user,id=qanet0,net=172.16.2.0/24 -device 
virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -object 
rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -boot 
once=d -device nec-usb-xhci -device usb-tablet -device usb-kbd -smp 1 
-enable-kvm -no-shutdown -vnc :99,share=force-shared -device virtio-serial 
-chardev 
pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on
 -device 
virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console 
-chardev 
pipe,id=virtio_console1,path=virtio_console1,logfile=virtio_console1.log,logappend=on
 -device 
virtconsole,chardev=virtio_console1,name=org.openqa.console.virtio_console1 
-chardev 
socket,path=qmp_socket,server=on,wait=off,id=qmp_socket,logfile=qmp_socket.log,logappend=on
 -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev 
driver=file,node-name=hd0-file,filename=/var/lib/openqa/pool/9/raid/hd0,cache.no-flush=on
 -blockdev 
driver=qcow2,node-name=hd0,file=hd0-file,cache.no-flush=on,discard=unmap 
-device virtio-blk,id=hd0-device,drive=hd0,serial=hd0 -blockdev 
driver=file,node-name=cd0-overlay0-file,filename=/var/lib/openqa/pool/9/raid/cd0-overlay0,cache.no-flush=on
 -blockdev 
driver=qcow2,node-name=cd0-overlay0,file=cd0-overlay0-file,cache.no-flush=on,discard=unmap
 -device scsi-cd,id=cd0-device,drive=cd0-overlay0,serial=cd0

the version without `-vga none` would be literally exactly the same but
without that one option.

Note, it looks like I was just a bit impatient in my manual trials;
looking at some jobs that ran today, they did eventually clear to the
Fedora installer GUI after about 90-120 seconds. But they definitely
don't show the bootloader (which our test system expects to see, so the
test fails). When run without the `-vga none` part, the bootloader is
shown at the same resolution and using the same fonts as the OFW
interface.
> 
> I'm actually surprised that you can combo '-vga none -display VGA' together
> in the command line is executed without a parse error.

I found various past mailing list discussions suggesting this is a good
idea just to ensure qemu doesn't add the 'default' device (so far as
the `-vga` arg is concerned) to the specified video device. Gerd didn't
see any problem with doing it when I asked him, either.
> 
> This also works, which is also surprising to me:
> 
> 
> (launches the process with the 'curses' display)
>

RE: [PATCH v2 4/4] virtio-gpu: Don't require udmabuf when blob support is enabled

2022-09-27 Thread Kasireddy, Vivek

Hi Dmitry,

> 
> On 9/27/22 11:32, Gerd Hoffmann wrote:
> > On Mon, Sep 26, 2022 at 09:32:40PM +0300, Dmitry Osipenko wrote:
> >> On 9/23/22 15:32, Gerd Hoffmann wrote:
> >>> On Tue, Sep 13, 2022 at 12:50:22PM +0200, Antonio Caggiano wrote:
>  From: Dmitry Osipenko 
> 
>  Host blobs don't need udmabuf, it's only needed by guest blobs. The host
>  blobs are utilized by the Mesa virgl driver when persistent memory 
>  mapping
>  is needed by a GL buffer, otherwise virgl driver doesn't use blobs.
>  Persistent mapping support bumps GL version from 4.3 to 4.5 in guest.
>  Relax the udmabuf requirement.
> >>>
> >>> What about blob=on,virgl=off?
> >>>
> >>> In that case qemu manages the resources and continued to require
> >>> udmabuf.
> >>
> >> The udmabuf is used only by the blob resource-creation command in Qemu.
> >> I couldn't find when we could hit that udmabuf code path in Qemu because
> >> BLOB_MEM_GUEST resource type is used only by crosvm+Venus when crosvm
> >> uses a dedicated render-server for virglrenderer.
> >
> > Recent enough linux guest driver will use BLOB_MEM_GUEST resources
> > with blob=on + virgl=off
> 
> I reproduced this case today using "-device
> virtio-gpu-device,blob=true". You're right, it doesn't work without udmabuf.
> 
> [8.369306] virtio_gpu_dequeue_ctrl_func: 90 callbacks suppressed
> [8.369311] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response
> 0x1200 (command 0x105)
> [8.371848] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response
> 0x1205 (command 0x104)
> [8.373085] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response
> 0x1200 (command 0x105)
> [8.376273] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response
> 0x1205 (command 0x104)
> [8.416972] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response
> 0x1200 (command 0x105)
> [8.418841] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response
> 0x1205 (command 0x104)
> 
> I see now why you're wanting to keep the udmabuf requirement for
> blob=on,virgl=off.
> 
> 
> >>   - /dev/udmabuf isn't accessible by normal user
> >>   - udmabuf driver isn't shipped by all of the popular Linux distros,
> >> for example Debian doesn't ship it
> >
> > That's why blob resources are off by default.
> >
> >> Because of all of the above, I don't think it makes sense to
> >> hard-require udmabuf at the start of Qemu. It's much better to fail
> >> resource creation dynamically.
> >
> > Disagree.  When virgl/venus is enabled, then yes, qemu would let
> > virglrenderer manage resources and I'm ok with whatever requirements
> > virglrenderer has.  When qemu manages resources by itself udmabuf is
> > a hard requirement for blob support though.
> 
> Let's try to relax the udmabuf requirement only for blob=on,virgl=on.
> I'll update this patch.
[Kasireddy, Vivek] In addition to what Gerd mentioned and what you have
discovered, I wanted to briefly add that we use Udmabuf/Guest Blobs for
Headless GPU passthrough use-cases (blob=on and virgl=off). Here are some
more details about our use-case:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9592#note_851314

Thanks,
Vivek

> 
> Thank you very much for the review!
> 
> --
> Best regards,
> Dmitry
>

Re: [PATCH v2 07/11] acpi/tests/bits: add python test that exercizes QEMU bios tables using biosbits

2022-09-27 Thread Paolo Bonzini

Il dom 10 lug 2022, 19:01 Ani Sinha  ha scritto:

> This change adds python based test environment that can be used to run
> pytest
> from within a virtual environment. A bash script sets up a virtual
> environment
> and then runs the python based tests from within that environment.
> All dependent python packages are installed in the virtual environment
> using
> pip python module. QEMU python test modules are also available in the
> environment
> for spawning the QEMU based VMs.
>
> It also introduces QEMU acpi/smbios biosbits python test script which is
> run
> from within the python virtual environment. When the bios bits tests are
> run,
> bios bits binaries are downloaded from an external repo/location.
> Currently, the test points to an external private github repo where the
> bits
> archives are checked in.


The virtual environment should be set up from configure, similar to git
submodules. John was working on it and probably can point you at some
earlier discussions in the archives about how to do it.

I also second the idea of using avocado instead of pytest, by the way.

Paolo


> Signed-off-by: Ani Sinha 
> ---
>  tests/pytest/acpi-bits/acpi-bits-test-venv.sh |  59 +++
>  tests/pytest/acpi-bits/acpi-bits-test.py  | 382 ++
>  tests/pytest/acpi-bits/meson.build|  33 ++
>  tests/pytest/acpi-bits/requirements.txt   |   1 +
>  4 files changed, 475 insertions(+)
>  create mode 100644 tests/pytest/acpi-bits/acpi-bits-test-venv.sh
>  create mode 100644 tests/pytest/acpi-bits/acpi-bits-test.py
>  create mode 100644 tests/pytest/acpi-bits/meson.build
>  create mode 100644 tests/pytest/acpi-bits/requirements.txt
>
> diff --git a/tests/pytest/acpi-bits/acpi-bits-test-venv.sh
> b/tests/pytest/acpi-bits/acpi-bits-test-venv.sh
> new file mode 100644
> index 00..186395473b
> --- /dev/null
> +++ b/tests/pytest/acpi-bits/acpi-bits-test-venv.sh
> @@ -0,0 +1,59 @@
> +#!/usr/bin/env bash
> +# Generates a python virtual environment for the test to run.
> +# Then runs python test scripts from within that virtual environment.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see .
> +#
> +# Author: Ani Sinha 
> +
> +set -e
> +
> +MYPATH=$(realpath ${BASH_SOURCE:-$0})
> +MYDIR=$(dirname $MYPATH)
> +
> +if [ -z "$PYTEST_SOURCE_ROOT" ]; then
> +echo -n "Please set QTEST_SOURCE_ROOT env pointing"
> +echo " to the root of the qemu source tree."
> +echo -n "This is required so that the test can find the "
> +echo "python modules that it needs for execution."
> +exit 1
> +fi
> +SRCDIR=$PYTEST_SOURCE_ROOT
> +TESTSCRIPTS=("acpi-bits-test.py")
> +PIPCMD="-m pip -q --disable-pip-version-check"
> +# we need to save the old value of PWD before we do a change-dir later
> +PYTEST_PWD=$PWD
> +
> +TESTS_PYTHON=/usr/bin/python3
> +TESTS_VENV_REQ=requirements.txt
> +
> +# sadly for pip -e and -t options do not work together.
> +# please see https://github.com/pypa/pip/issues/562
> +cd $MYDIR
> +
> +$TESTS_PYTHON -m venv .
> +$TESTS_PYTHON $PIPCMD install -e $SRCDIR/python/
> +[ -f $TESTS_VENV_REQ ] && \
> +$TESTS_PYTHON $PIPCMD install -r $TESTS_VENV_REQ || exit 0
> +
> +# venv is activated at this point.
> +
> +# run the test
> +for testscript in ${TESTSCRIPTS[@]} ; do
> +export PYTEST_PWD; python3 $testscript
> +done
> +
> +cd $PYTEST_PWD
> +
> +exit 0
> diff --git a/tests/pytest/acpi-bits/acpi-bits-test.py
> b/tests/pytest/acpi-bits/acpi-bits-test.py
> new file mode 100644
> index 00..97e61eb709
> --- /dev/null
> +++ b/tests/pytest/acpi-bits/acpi-bits-test.py
> @@ -0,0 +1,382 @@
> +#!/usr/bin/env python3
> +# group: rw quick
> +# Exercize QEMU generated ACPI/SMBIOS tables using biosbits,
> +# https://biosbits.org/
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see

[RFC PATCH] tests/qtest: bump up QOS_PATH_MAX_ELEMENT_SIZE

2022-09-27 Thread Alex Bennée

It seems the depth of path we need to support can vary depending on
the order of the init constructors getting called. It seems
--enable-lto shuffles things around just enough to push you over the
limit.

Signed-off-by: Alex Bennée 
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1186
---
 tests/qtest/libqos/qgraph.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/libqos/qgraph.h b/tests/qtest/libqos/qgraph.h
index 6e94824d09..5c0046e989 100644
--- a/tests/qtest/libqos/qgraph.h
+++ b/tests/qtest/libqos/qgraph.h
@@ -24,7 +24,7 @@
 #include "libqos-malloc.h"
 
 /* maximum path length */
-#define QOS_PATH_MAX_ELEMENT_SIZE 50
+#define QOS_PATH_MAX_ELEMENT_SIZE 64
 
 typedef struct QOSGraphObject QOSGraphObject;
 typedef struct QOSGraphNode QOSGraphNode;
-- 
2.34.1

Re: [PATCH v2 10/11] pytest: add pytest to the meson build system

2022-09-27 Thread Michael S. Tsirkin

On Tue, Sep 06, 2022 at 02:10:56PM +0100, Daniel P. Berrangé wrote:
> On Tue, Jul 12, 2022 at 12:22:10PM +0530, Ani Sinha wrote:
> > 
> > 
> > On Mon, 11 Jul 2022, John Snow wrote:
> > 
> > > On Sun, Jul 10, 2022 at 1:01 PM Ani Sinha  wrote:
> > > >
> > > > Integrate the pytest framework with the meson build system. This will 
> > > > make meson
> > > > run all the pytests under the pytest directory.
> > > >
> > > > Signed-off-by: Ani Sinha 
> > > > ---
> > > >  tests/Makefile.include   |  4 +++-
> > > >  tests/meson.build|  1 +
> > > >  tests/pytest/meson.build | 49 
> > > >  3 files changed, 53 insertions(+), 1 deletion(-)
> > > >  create mode 100644 tests/pytest/meson.build
> > > >
> > > > diff --git a/tests/Makefile.include b/tests/Makefile.include
> > > > index 3accb83b13..40755a6bd1 100644
> > > > --- a/tests/Makefile.include
> > > > +++ b/tests/Makefile.include
> > > > @@ -3,12 +3,14 @@
> > > >  .PHONY: check-help
> > > >  check-help:
> > > > @echo "Regression testing targets:"
> > > > -   @echo " $(MAKE) check  Run block, qapi-schema, 
> > > > unit, softfloat, qtest and decodetree tests"
> > > > +   @echo " $(MAKE) check  Run block, qapi-schema, 
> > > > unit, softfloat, qtest, pytest and decodetree tests"
> > >
> > > Does this mean that "make check" *requires* an internet connection?
> > 
> > No. My test will be skipped if it is unable to download the artifacts it
> > requires due to lack of Internet connectivity.
> 
> That's not the only concern, there are also people who have metered
> internet connections, or whose connections are slow and thus have
> long download times. Any test that downloads should be opt-in only.
> 
> 
> With regards,
> Daniel


This is why I wanted git submodules. A well understood decentralized
model. Now we are reinventing them badly.
I asked on the maintainers summit what issues people have with
submodules, no one volunteered any information.
It might make sense to figure out if there's a way to
use submodules sanely.




Anyway, download should just be done separately,
make check should just verify it has the correct binary
and if not fail.

And I'd like to have a target that fails if it can not
run the tests a opposed to skipping.



> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2 07/11] acpi/tests/bits: add python test that exercizes QEMU bios tables using biosbits

2022-09-27 Thread Michael S. Tsirkin

On Tue, Sep 27, 2022 at 03:37:39PM +0530, Ani Sinha wrote:
> > > > > > >
> > > > > > > OK fine. Lets figuire out how to push bits somewhere in 
> > > > > > > git.qemu.org and
> > > > > > > the binaries in some other repo first. Everything else hinges on 
> > > > > > > that. We
> > > > > > > can fix the rest of the bits later incrementally.
> > > > > >
> > > > > > DanPB, any thoughts on putting bits on git.qemu.org or where and 
> > > > > > how to
> > > > > > keep the binaries?
> > > > >
> > > > > Can we please conclude on this?
> > > > > Peter, can you please fork the repo? I have tried many times to reach
> > > > > you on IRC but failed.
> > > >
> > > > Probably because of travel around KVM forum.
> > > >
> > > > I think given our CI is under pressure again due to gitlab free tier
> > > > limits, tying binaries to CI isn't a great idea at this stage.
> > > > Can Ani just upload binaies to qemu.org for now?
> > >
> > > I agree with Michael here. Having a full ci/cd job for this is
> > > overkill IMHO. We should create a repo just for the binaries, have a
> > > README there to explain how we generate them and check in new versions
> > > as and when needed (it won't be frequent).
> > > How about biosbits-bin repo?
> >
> > If QEMU is hosting binaries, where any part contains GPL code, then we
> > need to be providing the full and corresponding source and the build
> > scripts needed to re-create the binary. Once we have such scripts it
> > should be trivial to trigger that from a CI job. If it isn't then
> > we're doing something wrong.
> 
> I was thinking of commiting the build scripts in a branch of
> https://gitlab.com/qemu-project/biosbits-bits.
> This would separate them from the main source. The scripts would build
> a version of qemu-bits based on the version information passed to it
> from the environment.
> Before I committed the scripts, I wanted to check whether we would
> want to do that or have a separate repo containing the binaries and
> the build scripts.
> Seems we want the former.

A separate repo is standard imho. Don't see any advantages to
abusing git branches like that.

> As for the gitlab-ci, I looked at the yaml file and the qemu ones
> looks quite complicated. Can someone help me generate one based on the
> build script here?
> https://github.com/ani-sinha/bits/blob/bits-qemu-logging/build-artifacts.sh
> 
> > The CI quota is not an issue, because
> > this is not a job that we need to run continuously. It can be triggered
> > manually as & when we decide we need to refresh the binary, so would
> > be a small one-off CI quota hit.
> >
> > Also note that gitlab is intending to start enforcing storage quota
> > on projects in the not too distant future. This makes it unappealing
> > to store binaries in git repos, unless we genuinely need the ability
> > to access historical versions of the binary. I don't believe we need
> > that for biosbits.
> >
> > The binary can be published as a CI artifact and accessed directly
> > from the latest artifact download link. This ensures we only consume
> > quota for the most recently published binary artifact. So I don't see
> > a compelling reason to upload binaries into git.
> >
> > With regards,
> > Daniel
> > --
> > |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange 
> > :|
> > |: https://libvirt.org -o-https://fstop138.berrange.com 
> > :|
> > |: https://entangle-photo.org-o-https://www.instagram.com/dberrange 
> > :|
> >

Re: [PATCH v2 07/11] acpi/tests/bits: add python test that exercizes QEMU bios tables using biosbits

2022-09-27 Thread Michael S. Tsirkin

On Tue, Sep 27, 2022 at 04:45:09PM +0100, Daniel P. Berrangé wrote:
> On Tue, Sep 27, 2022 at 07:35:13PM +0530, Ani Sinha wrote:
> > On Tue, Sep 27, 2022 at 5:12 PM Ani Sinha  wrote:
> > >
> > > On Tue, Sep 27, 2022 at 3:48 PM Daniel P. Berrangé  
> > > wrote:
> > > >
> > > > On Tue, Sep 27, 2022 at 03:37:39PM +0530, Ani Sinha wrote:
> > > > > > > > > > >
> > > > > > > > > > > OK fine. Lets figuire out how to push bits somewhere in 
> > > > > > > > > > > git.qemu.org and
> > > > > > > > > > > the binaries in some other repo first. Everything else 
> > > > > > > > > > > hinges on that. We
> > > > > > > > > > > can fix the rest of the bits later incrementally.
> > > > > > > > > >
> > > > > > > > > > DanPB, any thoughts on putting bits on git.qemu.org or 
> > > > > > > > > > where and how to
> > > > > > > > > > keep the binaries?
> > > > > > > > >
> > > > > > > > > Can we please conclude on this?
> > > > > > > > > Peter, can you please fork the repo? I have tried many times 
> > > > > > > > > to reach
> > > > > > > > > you on IRC but failed.
> > > > > > > >
> > > > > > > > Probably because of travel around KVM forum.
> > > > > > > >
> > > > > > > > I think given our CI is under pressure again due to gitlab free 
> > > > > > > > tier
> > > > > > > > limits, tying binaries to CI isn't a great idea at this stage.
> > > > > > > > Can Ani just upload binaies to qemu.org for now?
> > > > > > >
> > > > > > > I agree with Michael here. Having a full ci/cd job for this is
> > > > > > > overkill IMHO. We should create a repo just for the binaries, 
> > > > > > > have a
> > > > > > > README there to explain how we generate them and check in new 
> > > > > > > versions
> > > > > > > as and when needed (it won't be frequent).
> > > > > > > How about biosbits-bin repo?
> > > > > >
> > > > > > If QEMU is hosting binaries, where any part contains GPL code, then 
> > > > > > we
> > > > > > need to be providing the full and corresponding source and the build
> > > > > > scripts needed to re-create the binary. Once we have such scripts it
> > > > > > should be trivial to trigger that from a CI job. If it isn't then
> > > > > > we're doing something wrong.
> > > > >
> > > > > I was thinking of commiting the build scripts in a branch of
> > > > > https://gitlab.com/qemu-project/biosbits-bits.
> > > > > This would separate them from the main source. The scripts would build
> > > > > a version of qemu-bits based on the version information passed to it
> > > > > from the environment.
> > > > > Before I committed the scripts, I wanted to check whether we would
> > > > > want to do that or have a separate repo containing the binaries and
> > > > > the build scripts.
> > > > > Seems we want the former.
> > > > >
> > > > > As for the gitlab-ci, I looked at the yaml file and the qemu ones
> > > > > looks quite complicated. Can someone help me generate one based on the
> > > > > build script here?
> > > > > https://github.com/ani-sinha/bits/blob/bits-qemu-logging/build-artifacts.sh
> > > >
> > > > Yes, QEMU's rules aren't a good place to start if you're trying
> > > > to learn gitlab CI, as they're very advanced.
> > > >
> > > > The simple case though is quite simple.
> > > >
> > > >   * You need a container image to act as the build env
> > > >   * In 'before_script'  install any packages you need on top of the
> > > > base container image for build deps
> > > >   * In 'script'  run whatever shell commands you need in order
> > > > to build the project
> > > >   * Add an 'artifacts' section to publish one (or more) files/dirs
> > > > as output
> > > >
> > > > The simplest example would be something like
> > > >
> > > >mybuild:
> > > >  image: fedora:36
> > > >  before_script:
> > > >- dnf install -y gcc
> > > >  script:
> > > >- gcc -o myapp myapp.c
> > > >  artifacts
> > > >paths:
> > > >  - myapp
> > > >
> > >
> > > How does this look?
> > > https://pastebin.com/0YyLFmi3
> > 
> > Alright, .gitlab-ci.yml is produced and the pipeline succeeds.
> > However, the question still remains, where do we keep the generated
> > artifacts?
> 
> The following link will always reflect the published artifacts from
> the most recently fully successful CI pipeline, on the 'qemu-bits'
> branch, and 'qemu-bits-build' CI job:
> 
> https://gitlab.com/qemu-project/biosbits-bits/-/jobs/artifacts/qemu-bits/download?job=qemu-bits-build
> 
> Tweak as needed if you push the CI to master branch instead. This
> link can be considered the permanent home of the artifact. I'd just
> suggest that the QEMU job automatically skip if it fails to download
> the artifact, as occassionally transient infra errors can impact
> it.
> 
> With regards,
> Daniel

This just means once we change the test old qemu source can no longer use it.
Why is this a good idea? Are we so short on disk space? I thought CPU
is the limiting factor?

> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange

Re: [PATCH v2 07/11] acpi/tests/bits: add python test that exercizes QEMU bios tables using biosbits

2022-09-27 Thread Michael S. Tsirkin

On Tue, Sep 27, 2022 at 09:33:27AM +0100, Daniel P. Berrangé wrote:
> On Tue, Sep 27, 2022 at 01:43:15PM +0530, Ani Sinha wrote:
> > On Sun, Sep 18, 2022 at 1:58 AM Michael S. Tsirkin  wrote:
> > >
> > > On Fri, Sep 16, 2022 at 09:30:42PM +0530, Ani Sinha wrote:
> > > > On Thu, Jul 28, 2022 at 12:08 AM Ani Sinha  wrote:
> > > > >
> > > > >
> > > > >
> > > > > On Mon, 25 Jul 2022, Ani Sinha wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > On Sat, 16 Jul 2022, Michael S. Tsirkin wrote:
> > > > > >
> > > > > > > On Sat, Jul 16, 2022 at 12:06:00PM +0530, Ani Sinha wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jul 15, 2022 at 11:20 Michael S. Tsirkin 
> > > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Fri, Jul 15, 2022 at 09:47:27AM +0530, Ani Sinha wrote:
> > > > > > > > > > Instead of all this mess, can't we just spawn e.g. "git 
> > > > > > > > clone --depth
> > > > > > > > 1"?
> > > > > > > > > > And if the directory exists I would fetch and checkout.
> > > > > > > > >
> > > > > > > > > There are two reasons I can think of why I do not like 
> > > > > > > > this idea:
> > > > > > > > >
> > > > > > > > > (a) a git clone of a whole directory would download all 
> > > > > > > > versions of the
> > > > > > > > > binary whereas we want only a specific version.
> > > > > > > >
> > > > > > > > You mention shallow clone yourself, and I used --depth 1 
> > > > > > > > above.
> > > > > > > >
> > > > > > > > > Downloading a single file
> > > > > > > > > by shallow cloning or creating a git archive is overkill 
> > > > > > > > IMHO when a wget
> > > > > > > > > style retrieval works just fine.
> > > > > > > >
> > > > > > > > However, it does not provide for versioning, tagging etc so 
> > > > > > > > you have
> > > > > > > > to implement your own schema.
> > > > > > > >
> > > > > > > >
> > > > > > > > Hmm I’m not sure if we need all that. Bits has its own 
> > > > > > > > versioning mechanism and
> > > > > > > > I think all we need to do is maintain the same versioning logic 
> > > > > > > > and maintain
> > > > > > > > binaries of different  versions. Do we really need the power of 
> > > > > > > > git/version
> > > > > > > > control here? Dunno.
> > > > > > >
> > > > > > > Well we need some schema. Given we are not using official bits 
> > > > > > > releases
> > > > > > > I don't think we can reuse theirs.
> > > > > >
> > > > > > OK fine. Lets figuire out how to push bits somewhere in 
> > > > > > git.qemu.org and
> > > > > > the binaries in some other repo first. Everything else hinges on 
> > > > > > that. We
> > > > > > can fix the rest of the bits later incrementally.
> > > > >
> > > > > DanPB, any thoughts on putting bits on git.qemu.org or where and how 
> > > > > to
> > > > > keep the binaries?
> > > >
> > > > Can we please conclude on this?
> > > > Peter, can you please fork the repo? I have tried many times to reach
> > > > you on IRC but failed.
> > >
> > > Probably because of travel around KVM forum.
> > >
> > > I think given our CI is under pressure again due to gitlab free tier
> > > limits, tying binaries to CI isn't a great idea at this stage.
> > > Can Ani just upload binaies to qemu.org for now?
> > 
> > I agree with Michael here. Having a full ci/cd job for this is
> > overkill IMHO. We should create a repo just for the binaries, have a
> > README there to explain how we generate them and check in new versions
> > as and when needed (it won't be frequent).
> > How about biosbits-bin repo?
> 
> If QEMU is hosting binaries, where any part contains GPL code, then we
> need to be providing the full and corresponding source and the build
> scripts needed to re-create the binary. Once we have such scripts it
> should be trivial to trigger that from a CI job. If it isn't then
> we're doing something wrong.  The CI quota is not an issue, because
> this is not a job that we need to run continuously. It can be triggered
> manually as & when we decide we need to refresh the binary, so would
> be a small one-off CI quota hit.
> 
> Also note that gitlab is intending to start enforcing storage quota
> on projects in the not too distant future. This makes it unappealing
> to store binaries in git repos, unless we genuinely need the ability
> to access historical versions of the binary. I don't believe we need
> that for biosbits.
> 
> The binary can be published as a CI artifact and accessed directly
> from the latest artifact download link. This ensures we only consume
> quota for the most recently published binary artifact. So I don't see
> a compelling reason to upload binaries into git.
> 
> With regards,
> Daniel

I don't really care where we upload them but only having the
latest version is just going to break anything expecting
the old binary.



> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |:

Re: [PATCH v11 18/21] job.c: enable job lock/unlock and remove Aiocontext locks

2022-09-27 Thread Paolo Bonzini

Il lun 26 set 2022, 14:21 Vladimir Sementsov-Ogievskiy <
vsement...@yandex-team.ru> ha scritto:

> On 9/18/22 20:12, Emanuele Giuseppe Esposito wrote:
> >>> --- a/qemu-img.c
> >>> +++ b/qemu-img.c
> >>> @@ -911,7 +911,6 @@ static void run_block_job(BlockJob *job, Error
> >>> **errp)
> >>>AioContext *aio_context = block_job_get_aio_context(job);
> >>>int ret = 0;
> >>>-aio_context_acquire(aio_context);
> >>>job_lock();
> >>>job_ref_locked(>job);
> >>>do {
> >> aio_poll() call here, doesn't require aio_context to be acquired?
> > On the contrary I think, if you see in AIO_WAIT_WHILE we explicitly
> > release it because we don't want to allow something else to run with the
> > aiocontext acquired.
> >
>
> Still, in AIO_WAIT_WHILE() we release ctx_, but do
> aio_poll(qemu_get_aio_context(), true), so we poll in other context.
>
> But here in qemu-img.c we drop aiocontext lock exactly for aio_context,
> which is an argument of aio_poll()..
>

It's the same, the acquire/release is done again in file descriptor
callback or bottom halves (typically via aio_co_wake).

Paolo


> --
> Best regards,
> Vladimir
>
>

Re: [QEMU][PATCH 0/5] Introduce Xilinx Versal CANFD

2022-09-27 Thread Vikram Garhwal


Hi Peter,

Thanks for pointing this. Looks like my configs are outdated. Will take 
care of this for v2.


Regards,

Vikram

On 9/22/22 7:20 AM, Peter Maydell wrote:

On Sat, 10 Sept 2022 at 09:15, Vikram Garhwal  wrote:

Hi,
This patch implements CANFD controller for xlnx-versal-virt machine. There are
two controllers CANFD0@0xFF06_ and CANFD1@0xFF07_ are connected to the
machine.

Also, added basic qtests for data exchange between both the controllers in
various supported configs.

Regards,
Vikram

Vikram Garhwal (5):
   MAINTAINERS: Update maintainer's email for Xilinx CAN
   hw/net/can: Introduce Xilinx Versal CANFD controller
   xlnx-zynqmp: Connect Xilinx VERSAL CANFD controllers
   tests/qtest: Introduce tests for Xilinx VERSAL CANFD controller
   MAINTAINERS: Include canfd tests under Xilinx CAN

Hi -- something odd seems to have happened with the cover letter for this
series: the patches haven't ended up as followups to the cover letter.
You can see this in the lore archive where the cover letter ends up
here on its own:
https://lore.kernel.org/qemu-devel/20220910061209.2559-1-vikram.garh...@amd.com/
but the actual patches are here, with 2,3,4,5 showing up as replies to 1:
https://lore.kernel.org/qemu-devel/20220910061252.2614-1-vikram.garh...@amd.com/
This means that patchew couldn't find the patches:
https://patchew.org/QEMU/20220910061209.2559-1-vikram.garh...@amd.com/

If you could look at what went wrong in your config for next time it
will make the tooling happier.

thanks
-- PMM

[RFC PATCH v2 29/29] target/ppc: move the p*_interrupt_powersave methods to excp_helper.c

2022-09-27 Thread Matheus Ferst

Move the methods to excp_helper.c and make them static.

Signed-off-by: Matheus Ferst 
---
 target/ppc/cpu_init.c| 102 ---
 target/ppc/excp_helper.c | 102 +++
 target/ppc/internal.h|   6 ---
 3 files changed, 102 insertions(+), 108 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 4d0064c7a5..a9c2726d51 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -5960,30 +5960,6 @@ static bool ppc_pvr_match_power7(PowerPCCPUClass *pcc, 
uint32_t pvr, bool best)
 return true;
 }
 
-int p7_interrupt_powersave(CPUPPCState *env)
-{
-if ((env->pending_interrupts & PPC_INTERRUPT_EXT) &&
-(env->spr[SPR_LPCR] & LPCR_P7_PECE0)) {
-return PPC_INTERRUPT_EXT;
-}
-if ((env->pending_interrupts & PPC_INTERRUPT_DECR) &&
-(env->spr[SPR_LPCR] & LPCR_P7_PECE1)) {
-return PPC_INTERRUPT_DECR;
-}
-if ((env->pending_interrupts & PPC_INTERRUPT_MCK) &&
-(env->spr[SPR_LPCR] & LPCR_P7_PECE2)) {
-return PPC_INTERRUPT_MCK;
-}
-if ((env->pending_interrupts & PPC_INTERRUPT_HMI) &&
-(env->spr[SPR_LPCR] & LPCR_P7_PECE2)) {
-return PPC_INTERRUPT_HMI;
-}
-if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
-return PPC_INTERRUPT_RESET;
-}
-return 0;
-}
-
 POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
@@ -6120,38 +6096,6 @@ static bool ppc_pvr_match_power8(PowerPCCPUClass *pcc, 
uint32_t pvr, bool best)
 return true;
 }
 
-int p8_interrupt_powersave(CPUPPCState *env)
-{
-if ((env->pending_interrupts & PPC_INTERRUPT_EXT) &&
-(env->spr[SPR_LPCR] & LPCR_P8_PECE2)) {
-return PPC_INTERRUPT_EXT;
-}
-if ((env->pending_interrupts & PPC_INTERRUPT_DECR) &&
-(env->spr[SPR_LPCR] & LPCR_P8_PECE3)) {
-return PPC_INTERRUPT_DECR;
-}
-if ((env->pending_interrupts & PPC_INTERRUPT_MCK) &&
-(env->spr[SPR_LPCR] & LPCR_P8_PECE4)) {
-return PPC_INTERRUPT_MCK;
-}
-if ((env->pending_interrupts & PPC_INTERRUPT_HMI) &&
-(env->spr[SPR_LPCR] & LPCR_P8_PECE4)) {
-return PPC_INTERRUPT_HMI;
-}
-if ((env->pending_interrupts & PPC_INTERRUPT_DOORBELL) &&
-(env->spr[SPR_LPCR] & LPCR_P8_PECE0)) {
-return PPC_INTERRUPT_DOORBELL;
-}
-if ((env->pending_interrupts & PPC_INTERRUPT_HDOORBELL) &&
-(env->spr[SPR_LPCR] & LPCR_P8_PECE1)) {
-return PPC_INTERRUPT_HDOORBELL;
-}
-if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
-return PPC_INTERRUPT_RESET;
-}
-return 0;
-}
-
 POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
@@ -6325,52 +6269,6 @@ static bool ppc_pvr_match_power9(PowerPCCPUClass *pcc, 
uint32_t pvr, bool best)
 return false;
 }
 
-int p9_interrupt_powersave(CPUPPCState *env)
-{
-/* External Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_EXT) &&
-(env->spr[SPR_LPCR] & LPCR_EEE)) {
-bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
-if (!heic || !FIELD_EX64_HV(env->msr) ||
-FIELD_EX64(env->msr, MSR, PR)) {
-return PPC_INTERRUPT_EXT;
-}
-}
-/* Decrementer Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_DECR) &&
-(env->spr[SPR_LPCR] & LPCR_DEE)) {
-return PPC_INTERRUPT_DECR;
-}
-/* Machine Check or Hypervisor Maintenance Exception */
-if (env->spr[SPR_LPCR] & LPCR_OEE) {
-if (env->pending_interrupts & PPC_INTERRUPT_MCK) {
-return PPC_INTERRUPT_MCK;
-}
-if (env->pending_interrupts & PPC_INTERRUPT_HMI) {
-return PPC_INTERRUPT_HMI;
-}
-}
-/* Privileged Doorbell Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_DOORBELL) &&
-(env->spr[SPR_LPCR] & LPCR_PDEE)) {
-return PPC_INTERRUPT_DOORBELL;
-}
-/* Hypervisor Doorbell Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_HDOORBELL) &&
-(env->spr[SPR_LPCR] & LPCR_HDEE)) {
-return PPC_INTERRUPT_HDOORBELL;
-}
-/* Hypervisor virtualization exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_HVIRT) &&
-(env->spr[SPR_LPCR] & LPCR_HVEE)) {
-return PPC_INTERRUPT_HVIRT;
-}
-if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
-return PPC_INTERRUPT_RESET;
-}
-return 0;
-}
-
 POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 9708f82b30..57937956e4 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1686,6 +1686,30 @@ void ppc_cpu_do_interrupt(CPUState *cs)
  PPC_INTERRUPT_PIT | PPC_INTERRUPT_DOORBELL | PPC_INTERRUPT_HDOORBELL | \
  PPC_INTERRUPT_THERM | PPC_INTERRUPT_EBB)
 
+static int

[RFC PATCH v2 27/29] target/ppc: introduce ppc_maybe_interrupt

2022-09-27 Thread Matheus Ferst

The method checks if any pending interrupt is unmasked and calls
cpu_interrupt/cpu_reset_interrupt accordingly. Code that raises/lowers
or masks/unmasks interrupts should call this method to keep
CPU_INTERRUPT_HARD coherent with env->pending_interrupts.

Signed-off-by: Matheus Ferst 
---
v2:
  - Found many other places where ppc_maybe_interrupt had to be called
with the IO and kvm-nested tests that Cédric suggested.
  - Create a helper to call ppc_maybe_interrupt to avoid using
helper_store_msr in WRTEE[I].

I couldn't find a better name for this method, so I used "maybe
interrupt" just like we have "maybe bswap" for gdbstub registers.
---
 hw/ppc/pnv_core.c|  1 +
 hw/ppc/ppc.c |  7 +--
 hw/ppc/spapr_hcall.c |  6 ++
 hw/ppc/spapr_rtas.c  |  2 +-
 target/ppc/cpu.c |  2 ++
 target/ppc/cpu.h |  1 +
 target/ppc/excp_helper.c | 29 +
 target/ppc/helper.h  |  1 +
 target/ppc/helper_regs.c |  2 ++
 target/ppc/translate.c   |  2 ++
 10 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
index 19e8eb885f..9ee79192dd 100644
--- a/hw/ppc/pnv_core.c
+++ b/hw/ppc/pnv_core.c
@@ -58,6 +58,7 @@ static void pnv_core_cpu_reset(PnvCore *pc, PowerPCCPU *cpu)
 env->msr |= MSR_HVB; /* Hypervisor mode */
 env->spr[SPR_HRMOR] = pc->hrmor;
 hreg_compute_hflags(env);
+ppc_maybe_interrupt(env);
 
 pcc->intc_reset(pc->chip, cpu);
 }
diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 77e611e81c..dc86c1c7db 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -42,7 +42,6 @@ static void cpu_ppc_tb_start (CPUPPCState *env);
 
 void ppc_set_irq(PowerPCCPU *cpu, int irq, int level)
 {
-CPUState *cs = CPU(cpu);
 CPUPPCState *env = >env;
 unsigned int old_pending;
 bool locked = false;
@@ -57,19 +56,15 @@ void ppc_set_irq(PowerPCCPU *cpu, int irq, int level)
 
 if (level) {
 env->pending_interrupts |= irq;
-cpu_interrupt(cs, CPU_INTERRUPT_HARD);
 } else {
 env->pending_interrupts &= ~irq;
-if (env->pending_interrupts == 0) {
-cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD);
-}
 }
 
 if (old_pending != env->pending_interrupts) {
+ppc_maybe_interrupt(env);
 kvmppc_set_interrupt(cpu, irq, level);
 }
 
-
 trace_ppc_irq_set_exit(env, irq, level, env->pending_interrupts,
CPU(cpu)->interrupt_request);
 
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index a8d4a6bcf0..23aa41c879 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -490,6 +490,7 @@ static target_ulong h_cede(PowerPCCPU *cpu, 
SpaprMachineState *spapr,
 
 env->msr |= (1ULL << MSR_EE);
 hreg_compute_hflags(env);
+ppc_maybe_interrupt(env);
 
 if (spapr_cpu->prod) {
 spapr_cpu->prod = false;
@@ -500,6 +501,7 @@ static target_ulong h_cede(PowerPCCPU *cpu, 
SpaprMachineState *spapr,
 cs->halted = 1;
 cs->exception_index = EXCP_HLT;
 cs->exit_request = 1;
+ppc_maybe_interrupt(env);
 }
 
 return H_SUCCESS;
@@ -521,6 +523,7 @@ static target_ulong h_confer_self(PowerPCCPU *cpu)
 cs->halted = 1;
 cs->exception_index = EXCP_HALTED;
 cs->exit_request = 1;
+ppc_maybe_interrupt(>env);
 
 return H_SUCCESS;
 }
@@ -633,6 +636,7 @@ static target_ulong h_prod(PowerPCCPU *cpu, 
SpaprMachineState *spapr,
 spapr_cpu = spapr_cpu_state(tcpu);
 spapr_cpu->prod = true;
 cs->halted = 0;
+ppc_maybe_interrupt(>env);
 qemu_cpu_kick(cs);
 
 return H_SUCCESS;
@@ -1661,6 +1665,7 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
 spapr_cpu->in_nested = true;
 
 hreg_compute_hflags(env);
+ppc_maybe_interrupt(env);
 tlb_flush(cs);
 env->reserve_addr = -1; /* Reset the reservation */
 
@@ -1802,6 +1807,7 @@ out_restore_l1:
 spapr_cpu->in_nested = false;
 
 hreg_compute_hflags(env);
+ppc_maybe_interrupt(env);
 tlb_flush(cs);
 env->reserve_addr = -1; /* Reset the reservation */
 
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index d58b65e88f..3f664ea02c 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -214,9 +214,9 @@ static void rtas_stop_self(PowerPCCPU *cpu, 
SpaprMachineState *spapr,
  * guest.
  * For the same reason, set PSSCR_EC.
  */
-ppc_store_lpcr(cpu, env->spr[SPR_LPCR] & ~pcc->lpcr_pm);
 env->spr[SPR_PSSCR] |= PSSCR_EC;
 cs->halted = 1;
+ppc_store_lpcr(cpu, env->spr[SPR_LPCR] & ~pcc->lpcr_pm);
 kvmppc_set_reg_ppc_online(cpu, 0);
 qemu_cpu_kick(cs);
 }
diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c
index e95b4c5ee1..1a97b41c6b 100644
--- a/target/ppc/cpu.c
+++ b/target/ppc/cpu.c
@@ -82,6 +82,8 @@ void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val)
 env->spr[SPR_LPCR] = val & pcc->lpcr_mask;
 /* The gtse bit affects hflags */
 hreg_compute_hflags(env);
+
+ppc_maybe_interrupt(env);
 }

[RFC PATCH v2 26/29] target/ppc: remove ppc_store_lpcr from CONFIG_USER_ONLY builds

2022-09-27 Thread Matheus Ferst

Writes to LPCR are hypervisor privileged.

Signed-off-by: Matheus Ferst 
---
The method introduced in the next patch, ppc_maybe_interrupt, will be
called in ppc_store_lpcr and only available in !CONFIG_USER_ONLY builds.
---
 target/ppc/cpu.c | 2 ++
 target/ppc/cpu.h | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c
index 0ebac04bc4..e95b4c5ee1 100644
--- a/target/ppc/cpu.c
+++ b/target/ppc/cpu.c
@@ -73,6 +73,7 @@ void ppc_store_msr(CPUPPCState *env, target_ulong value)
 hreg_store_msr(env, value, 0);
 }
 
+#if !defined(CONFIG_USER_ONLY)
 void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val)
 {
 PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
@@ -82,6 +83,7 @@ void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val)
 /* The gtse bit affects hflags */
 hreg_compute_hflags(env);
 }
+#endif
 
 static inline void fpscr_set_rounding_mode(CPUPPCState *env)
 {
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 9ccd23db04..7b13d4cf86 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1370,9 +1370,9 @@ void ppc_translate_init(void);
 
 #if !defined(CONFIG_USER_ONLY)
 void ppc_store_sdr1(CPUPPCState *env, target_ulong value);
+void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val);
 #endif /* !defined(CONFIG_USER_ONLY) */
 void ppc_store_msr(CPUPPCState *env, target_ulong value);
-void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val);
 
 void ppc_cpu_list(void);
 
-- 
2.25.1

[RFC PATCH v2 19/29] target/ppc: create an interrupt masking method for POWER7

2022-09-27 Thread Matheus Ferst

The new method is identical to ppc_next_unmasked_interrupt_generic,
processor-specific code will be added/removed in the following patches.
No functional change intended.

Signed-off-by: Matheus Ferst 
---
v2:
  - Renamed the method from ppc_pending_interrupt_p7 to
p7_next_unmasked_interrupt;
  - Processor-specific stuff moved to the following patches to ease
review.
---
 target/ppc/excp_helper.c | 114 +++
 1 file changed, 114 insertions(+)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 609579f45f..8e1e18317d 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1679,6 +1679,118 @@ void ppc_cpu_do_interrupt(CPUState *cs)
 }
 
 #if defined(TARGET_PPC64)
+static int p7_next_unmasked_interrupt(CPUPPCState *env)
+{
+bool async_deliver;
+
+/* External reset */
+if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
+return PPC_INTERRUPT_RESET;
+}
+/* Machine check exception */
+if (env->pending_interrupts & PPC_INTERRUPT_MCK) {
+return PPC_INTERRUPT_MCK;
+}
+#if 0 /* TODO */
+/* External debug exception */
+if (env->pending_interrupts & PPC_INTERRUPT_DEBUG) {
+return PPC_INTERRUPT_DEBUG;
+}
+#endif
+
+/*
+ * For interrupts that gate on MSR:EE, we need to do something a
+ * bit more subtle, as we need to let them through even when EE is
+ * clear when coming out of some power management states (in order
+ * for them to become a 0x100).
+ */
+async_deliver = FIELD_EX64(env->msr, MSR, EE) || env->resume_as_sreset;
+
+/* Hypervisor decrementer exception */
+if (env->pending_interrupts & PPC_INTERRUPT_HDECR) {
+/* LPCR will be clear when not supported so this will work */
+bool hdice = !!(env->spr[SPR_LPCR] & LPCR_HDICE);
+if ((async_deliver || !FIELD_EX64_HV(env->msr)) && hdice) {
+/* HDEC clears on delivery */
+return PPC_INTERRUPT_HDECR;
+}
+}
+
+/* Hypervisor virtualization interrupt */
+if (env->pending_interrupts & PPC_INTERRUPT_HVIRT) {
+/* LPCR will be clear when not supported so this will work */
+bool hvice = !!(env->spr[SPR_LPCR] & LPCR_HVICE);
+if ((async_deliver || !FIELD_EX64_HV(env->msr)) && hvice) {
+return PPC_INTERRUPT_HVIRT;
+}
+}
+
+/* External interrupt can ignore MSR:EE under some circumstances */
+if (env->pending_interrupts & PPC_INTERRUPT_EXT) {
+bool lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
+bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
+/* HEIC blocks delivery to the hypervisor */
+if ((async_deliver && !(heic && FIELD_EX64_HV(env->msr) &&
+!FIELD_EX64(env->msr, MSR, PR))) ||
+(env->has_hv_mode && !FIELD_EX64_HV(env->msr) && !lpes0)) {
+return PPC_INTERRUPT_EXT;
+}
+}
+if (FIELD_EX64(env->msr, MSR, CE)) {
+/* External critical interrupt */
+if (env->pending_interrupts & PPC_INTERRUPT_CEXT) {
+return PPC_INTERRUPT_CEXT;
+}
+}
+if (async_deliver != 0) {
+/* Watchdog timer on embedded PowerPC */
+if (env->pending_interrupts & PPC_INTERRUPT_WDT) {
+return PPC_INTERRUPT_WDT;
+}
+if (env->pending_interrupts & PPC_INTERRUPT_CDOORBELL) {
+return PPC_INTERRUPT_CDOORBELL;
+}
+/* Fixed interval timer on embedded PowerPC */
+if (env->pending_interrupts & PPC_INTERRUPT_FIT) {
+return PPC_INTERRUPT_FIT;
+}
+/* Programmable interval timer on embedded PowerPC */
+if (env->pending_interrupts & PPC_INTERRUPT_PIT) {
+return PPC_INTERRUPT_PIT;
+}
+/* Decrementer exception */
+if (env->pending_interrupts & PPC_INTERRUPT_DECR) {
+return PPC_INTERRUPT_DECR;
+}
+if (env->pending_interrupts & PPC_INTERRUPT_DOORBELL) {
+return PPC_INTERRUPT_DOORBELL;
+}
+if (env->pending_interrupts & PPC_INTERRUPT_HDOORBELL) {
+return PPC_INTERRUPT_HDOORBELL;
+}
+if (env->pending_interrupts & PPC_INTERRUPT_PERFM) {
+return PPC_INTERRUPT_PERFM;
+}
+/* Thermal interrupt */
+if (env->pending_interrupts & PPC_INTERRUPT_THERM) {
+return PPC_INTERRUPT_THERM;
+}
+/* EBB exception */
+if (env->pending_interrupts & PPC_INTERRUPT_EBB) {
+/*
+ * EBB exception must be taken in problem state and
+ * with BESCR_GE set.
+ */
+if (FIELD_EX64(env->msr, MSR, PR) &&
+(env->spr[SPR_BESCR] & BESCR_GE)) {
+return PPC_INTERRUPT_EBB;
+}
+}
+}
+
+return 0;
+}
+
 #define P8_UNUSED_INTERRUPTS \
 (PPC_INTERRUPT_RESET | PPC_INTERRUPT_DEBUG | PPC_INTERRUPT_HVIRT |  \
 PPC_INTERRUPT_CEXT |

[RFC PATCH v2 25/29] target/ppc: add power-saving interrupt masking logic to p7_next_unmasked_interrupt

2022-09-27 Thread Matheus Ferst

Export p7_interrupt_powersave and use it in p7_next_unmasked_interrupt.

Signed-off-by: Matheus Ferst 
---
 target/ppc/cpu_init.c|  2 +-
 target/ppc/excp_helper.c | 24 
 target/ppc/internal.h|  1 +
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index dd127cbeea..26686d1557 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -5960,7 +5960,7 @@ static bool ppc_pvr_match_power7(PowerPCCPUClass *pcc, 
uint32_t pvr, bool best)
 return true;
 }
 
-static int p7_interrupt_powersave(CPUPPCState *env)
+int p7_interrupt_powersave(CPUPPCState *env)
 {
 if ((env->pending_interrupts & PPC_INTERRUPT_EXT) &&
 (env->spr[SPR_LPCR] & LPCR_P7_PECE0)) {
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index ca594c3b9e..497a9889d1 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1687,10 +1687,18 @@ void ppc_cpu_do_interrupt(CPUState *cs)
 
 static int p7_next_unmasked_interrupt(CPUPPCState *env)
 {
-bool async_deliver;
+PowerPCCPU *cpu = env_archcpu(env);
+CPUState *cs = CPU(cpu);
+/* Ignore MSR[EE] when coming out of some power management states */
+bool msr_ee = FIELD_EX64(env->msr, MSR, EE) || env->resume_as_sreset;
 
 assert((env->pending_interrupts & P7_UNUSED_INTERRUPTS) == 0);
 
+if (cs->halted) {
+/* LPCR[PECE] controls which interrupts can exit power-saving mode */
+return p7_interrupt_powersave(env);
+}
+
 /* Machine check exception */
 if (env->pending_interrupts & PPC_INTERRUPT_MCK) {
 return PPC_INTERRUPT_MCK;
@@ -1702,19 +1710,11 @@ static int p7_next_unmasked_interrupt(CPUPPCState *env)
 }
 #endif
 
-/*
- * For interrupts that gate on MSR:EE, we need to do something a
- * bit more subtle, as we need to let them through even when EE is
- * clear when coming out of some power management states (in order
- * for them to become a 0x100).
- */
-async_deliver = FIELD_EX64(env->msr, MSR, EE) || env->resume_as_sreset;
-
 /* Hypervisor decrementer exception */
 if (env->pending_interrupts & PPC_INTERRUPT_HDECR) {
 /* LPCR will be clear when not supported so this will work */
 bool hdice = !!(env->spr[SPR_LPCR] & LPCR_HDICE);
-if ((async_deliver || !FIELD_EX64_HV(env->msr)) && hdice) {
+if ((msr_ee || !FIELD_EX64_HV(env->msr)) && hdice) {
 /* HDEC clears on delivery */
 return PPC_INTERRUPT_HDECR;
 }
@@ -1725,13 +1725,13 @@ static int p7_next_unmasked_interrupt(CPUPPCState *env)
 bool lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
 bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
 /* HEIC blocks delivery to the hypervisor */
-if ((async_deliver && !(heic && FIELD_EX64_HV(env->msr) &&
+if ((msr_ee && !(heic && FIELD_EX64_HV(env->msr) &&
 !FIELD_EX64(env->msr, MSR, PR))) ||
 (env->has_hv_mode && !FIELD_EX64_HV(env->msr) && !lpes0)) {
 return PPC_INTERRUPT_EXT;
 }
 }
-if (async_deliver != 0) {
+if (msr_ee != 0) {
 /* Decrementer exception */
 if (env->pending_interrupts & PPC_INTERRUPT_DECR) {
 return PPC_INTERRUPT_DECR;
diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index 9069874adb..25827ebf6f 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -309,6 +309,7 @@ static inline int ger_pack_masks(int pmsk, int ymsk, int 
xmsk)
 #if defined(TARGET_PPC64)
 int p9_interrupt_powersave(CPUPPCState *env);
 int p8_interrupt_powersave(CPUPPCState *env);
+int p7_interrupt_powersave(CPUPPCState *env);
 #endif
 
 #endif /* PPC_INTERNAL_H */
-- 
2.25.1

[RFC PATCH v2 18/29] target/ppc: add power-saving interrupt masking logic to p8_next_unmasked_interrupt

2022-09-27 Thread Matheus Ferst

Export p8_interrupt_powersave and use it in p8_next_unmasked_interrupt.

Signed-off-by: Matheus Ferst 
---
 target/ppc/cpu_init.c|  2 +-
 target/ppc/excp_helper.c | 24 
 target/ppc/internal.h|  1 +
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 59e4c325c5..319d2355ec 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6133,7 +6133,7 @@ static bool ppc_pvr_match_power8(PowerPCCPUClass *pcc, 
uint32_t pvr, bool best)
 return true;
 }
 
-static int p8_interrupt_powersave(CPUPPCState *env)
+int p8_interrupt_powersave(CPUPPCState *env)
 {
 if ((env->pending_interrupts & PPC_INTERRUPT_EXT) &&
 (env->spr[SPR_LPCR] & LPCR_P8_PECE2)) {
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 2e8d4699a9..609579f45f 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1687,28 +1687,28 @@ void ppc_cpu_do_interrupt(CPUState *cs)
 
 static int p8_next_unmasked_interrupt(CPUPPCState *env)
 {
-bool async_deliver;
+PowerPCCPU *cpu = env_archcpu(env);
+CPUState *cs = CPU(cpu);
+/* Ignore MSR[EE] when coming out of some power management states */
+bool msr_ee = FIELD_EX64(env->msr, MSR, EE) || env->resume_as_sreset;
 
 assert((env->pending_interrupts & P8_UNUSED_INTERRUPTS) == 0);
 
+if (cs->halted) {
+/* LPCR[PECE] controls which interrupts can exit power-saving mode */
+return p8_interrupt_powersave(env);
+}
+
 /* Machine check exception */
 if (env->pending_interrupts & PPC_INTERRUPT_MCK) {
 return PPC_INTERRUPT_MCK;
 }
 
-/*
- * For interrupts that gate on MSR:EE, we need to do something a
- * bit more subtle, as we need to let them through even when EE is
- * clear when coming out of some power management states (in order
- * for them to become a 0x100).
- */
-async_deliver = FIELD_EX64(env->msr, MSR, EE) || env->resume_as_sreset;
-
 /* Hypervisor decrementer exception */
 if (env->pending_interrupts & PPC_INTERRUPT_HDECR) {
 /* LPCR will be clear when not supported so this will work */
 bool hdice = !!(env->spr[SPR_LPCR] & LPCR_HDICE);
-if ((async_deliver || !FIELD_EX64_HV(env->msr)) && hdice) {
+if ((msr_ee || !FIELD_EX64_HV(env->msr)) && hdice) {
 /* HDEC clears on delivery */
 return PPC_INTERRUPT_HDECR;
 }
@@ -1719,13 +1719,13 @@ static int p8_next_unmasked_interrupt(CPUPPCState *env)
 bool lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
 bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
 /* HEIC blocks delivery to the hypervisor */
-if ((async_deliver && !(heic && FIELD_EX64_HV(env->msr) &&
+if ((msr_ee && !(heic && FIELD_EX64_HV(env->msr) &&
 !FIELD_EX64(env->msr, MSR, PR))) ||
 (env->has_hv_mode && !FIELD_EX64_HV(env->msr) && !lpes0)) {
 return PPC_INTERRUPT_EXT;
 }
 }
-if (async_deliver != 0) {
+if (msr_ee != 0) {
 /* Decrementer exception */
 if (env->pending_interrupts & PPC_INTERRUPT_DECR) {
 return PPC_INTERRUPT_DECR;
diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index 41e79adfdb..9069874adb 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -308,6 +308,7 @@ static inline int ger_pack_masks(int pmsk, int ymsk, int 
xmsk)
 
 #if defined(TARGET_PPC64)
 int p9_interrupt_powersave(CPUPPCState *env);
+int p8_interrupt_powersave(CPUPPCState *env);
 #endif
 
 #endif /* PPC_INTERNAL_H */
-- 
2.25.1

[RFC PATCH v2 24/29] target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER7

2022-09-27 Thread Matheus Ferst

Move the interrupt masking logic out of cpu_has_work_POWER7 in a new
method, p7_interrupt_powersave, that only returns an interrupt if it can
wake the processor from power-saving mode. No functional change
intended.

Signed-off-by: Matheus Ferst 
---
 target/ppc/cpu_init.c | 45 ---
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 319d2355ec..dd127cbeea 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -5960,6 +5960,30 @@ static bool ppc_pvr_match_power7(PowerPCCPUClass *pcc, 
uint32_t pvr, bool best)
 return true;
 }
 
+static int p7_interrupt_powersave(CPUPPCState *env)
+{
+if ((env->pending_interrupts & PPC_INTERRUPT_EXT) &&
+(env->spr[SPR_LPCR] & LPCR_P7_PECE0)) {
+return PPC_INTERRUPT_EXT;
+}
+if ((env->pending_interrupts & PPC_INTERRUPT_DECR) &&
+(env->spr[SPR_LPCR] & LPCR_P7_PECE1)) {
+return PPC_INTERRUPT_DECR;
+}
+if ((env->pending_interrupts & PPC_INTERRUPT_MCK) &&
+(env->spr[SPR_LPCR] & LPCR_P7_PECE2)) {
+return PPC_INTERRUPT_MCK;
+}
+if ((env->pending_interrupts & PPC_INTERRUPT_HMI) &&
+(env->spr[SPR_LPCR] & LPCR_P7_PECE2)) {
+return PPC_INTERRUPT_HMI;
+}
+if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
+return PPC_INTERRUPT_RESET;
+}
+return 0;
+}
+
 static bool cpu_has_work_POWER7(CPUState *cs)
 {
 PowerPCCPU *cpu = POWERPC_CPU(cs);
@@ -5969,26 +5993,7 @@ static bool cpu_has_work_POWER7(CPUState *cs)
 if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
 return false;
 }
-if ((env->pending_interrupts & PPC_INTERRUPT_EXT) &&
-(env->spr[SPR_LPCR] & LPCR_P7_PECE0)) {
-return true;
-}
-if ((env->pending_interrupts & PPC_INTERRUPT_DECR) &&
-(env->spr[SPR_LPCR] & LPCR_P7_PECE1)) {
-return true;
-}
-if ((env->pending_interrupts & PPC_INTERRUPT_MCK) &&
-(env->spr[SPR_LPCR] & LPCR_P7_PECE2)) {
-return true;
-}
-if ((env->pending_interrupts & PPC_INTERRUPT_HMI) &&
-(env->spr[SPR_LPCR] & LPCR_P7_PECE2)) {
-return true;
-}
-if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
-return true;
-}
-return false;
+return p7_interrupt_powersave(env) != 0;
 } else {
 return FIELD_EX64(env->msr, MSR, EE) &&
(cs->interrupt_request & CPU_INTERRUPT_HARD);
-- 
2.25.1

[RFC PATCH v2 23/29] target/ppc: remove generic architecture checks from p7_deliver_interrupt

2022-09-27 Thread Matheus Ferst

No functional change intended.

Signed-off-by: Matheus Ferst 
---
 target/ppc/excp_helper.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index de6972d002..ca594c3b9e 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -2072,9 +2072,6 @@ static void p7_deliver_interrupt(CPUPPCState *env, int 
interrupt)
 break;
 
 case PPC_INTERRUPT_DECR: /* Decrementer exception */
-if (ppc_decr_clear_on_delivery(env)) {
-env->pending_interrupts &= ~PPC_INTERRUPT_DECR;
-}
 powerpc_excp(cpu, POWERPC_EXCP_DECR);
 break;
 case PPC_INTERRUPT_PERFM:
-- 
2.25.1

[RFC PATCH v2 15/29] target/ppc: remove unused interrupts from p8_deliver_interrupt

2022-09-27 Thread Matheus Ferst

Remove the following unused interrupts from the POWER8 interrupt
processing method:
- PPC_INTERRUPT_RESET: only raised for 6xx, 7xx, 970, and POWER5p;
- Debug Interrupt: removed in Power ISA v2.07;
- Hypervisor Virtualization: introduced in Power ISA v3.0;
- Critical Input, Watchdog Timer, and Fixed Interval Timer: only defined
  for embedded CPUs;
- Hypervisor Doorbell, Doorbell, and Critical Doorbell: processor does
  not implement the "Embedded.Processor Control" category;
- Programmable Interval Timer: 40x-only;
- PPC_INTERRUPT_THERM: only raised for 970 and POWER5p;

Signed-off-by: Matheus Ferst 
---
 target/ppc/excp_helper.c | 48 
 1 file changed, 48 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 0405fc8eee..4cbf6b29fc 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1979,29 +1979,16 @@ static void p8_deliver_interrupt(CPUPPCState *env, int 
interrupt)
 CPUState *cs = env_cpu(env);
 
 switch (interrupt) {
-case PPC_INTERRUPT_RESET: /* External reset */
-env->pending_interrupts &= ~PPC_INTERRUPT_RESET;
-powerpc_excp(cpu, POWERPC_EXCP_RESET);
-break;
 case PPC_INTERRUPT_MCK: /* Machine check exception */
 env->pending_interrupts &= ~PPC_INTERRUPT_MCK;
 powerpc_excp(cpu, POWERPC_EXCP_MCHECK);
 break;
-#if 0 /* TODO */
-case PPC_INTERRUPT_DEBUG: /* External debug exception */
-env->pending_interrupts &= ~PPC_INTERRUPT_DEBUG;
-powerpc_excp(cpu, POWERPC_EXCP_DEBUG);
-break;
-#endif
 
 case PPC_INTERRUPT_HDECR: /* Hypervisor decrementer exception */
 /* HDEC clears on delivery */
 env->pending_interrupts &= ~PPC_INTERRUPT_HDECR;
 powerpc_excp(cpu, POWERPC_EXCP_HDECR);
 break;
-case PPC_INTERRUPT_HVIRT: /* Hypervisor virtualization interrupt */
-powerpc_excp(cpu, POWERPC_EXCP_HVIRT);
-break;
 
 case PPC_INTERRUPT_EXT:
 if (books_vhyp_promotes_external_to_hvirt(cpu)) {
@@ -2010,52 +1997,17 @@ static void p8_deliver_interrupt(CPUPPCState *env, int 
interrupt)
 powerpc_excp(cpu, POWERPC_EXCP_EXTERNAL);
 }
 break;
-case PPC_INTERRUPT_CEXT: /* External critical interrupt */
-powerpc_excp(cpu, POWERPC_EXCP_CRITICAL);
-break;
 
-case PPC_INTERRUPT_WDT: /* Watchdog timer on embedded PowerPC */
-env->pending_interrupts &= ~PPC_INTERRUPT_WDT;
-powerpc_excp(cpu, POWERPC_EXCP_WDT);
-break;
-case PPC_INTERRUPT_CDOORBELL:
-env->pending_interrupts &= ~PPC_INTERRUPT_CDOORBELL;
-powerpc_excp(cpu, POWERPC_EXCP_DOORCI);
-break;
-case PPC_INTERRUPT_FIT: /* Fixed interval timer on embedded PowerPC */
-env->pending_interrupts &= ~PPC_INTERRUPT_FIT;
-powerpc_excp(cpu, POWERPC_EXCP_FIT);
-break;
-case PPC_INTERRUPT_PIT: /* Programmable interval timer on embedded PowerPC 
*/
-env->pending_interrupts &= ~PPC_INTERRUPT_PIT;
-powerpc_excp(cpu, POWERPC_EXCP_PIT);
-break;
 case PPC_INTERRUPT_DECR: /* Decrementer exception */
 if (ppc_decr_clear_on_delivery(env)) {
 env->pending_interrupts &= ~PPC_INTERRUPT_DECR;
 }
 powerpc_excp(cpu, POWERPC_EXCP_DECR);
 break;
-case PPC_INTERRUPT_DOORBELL:
-env->pending_interrupts &= ~PPC_INTERRUPT_DOORBELL;
-if (is_book3s_arch2x(env)) {
-powerpc_excp(cpu, POWERPC_EXCP_SDOOR);
-} else {
-powerpc_excp(cpu, POWERPC_EXCP_DOORI);
-}
-break;
-case PPC_INTERRUPT_HDOORBELL:
-env->pending_interrupts &= ~PPC_INTERRUPT_HDOORBELL;
-powerpc_excp(cpu, POWERPC_EXCP_SDOOR_HV);
-break;
 case PPC_INTERRUPT_PERFM:
 env->pending_interrupts &= ~PPC_INTERRUPT_PERFM;
 powerpc_excp(cpu, POWERPC_EXCP_PERFM);
 break;
-case PPC_INTERRUPT_THERM:  /* Thermal interrupt */
-env->pending_interrupts &= ~PPC_INTERRUPT_THERM;
-powerpc_excp(cpu, POWERPC_EXCP_THERM);
-break;
 case PPC_INTERRUPT_EBB: /* EBB exception */
 env->pending_interrupts &= ~PPC_INTERRUPT_EBB;
 if (env->spr[SPR_BESCR] & BESCR_PMEO) {
-- 
2.25.1

[RFC PATCH v2 22/29] target/ppc: remove unused interrupts from p7_deliver_interrupt

2022-09-27 Thread Matheus Ferst

Remove the following unused interrupts from the POWER7 interrupt
processing method:
- PPC_INTERRUPT_RESET: only raised for 6xx, 7xx, 970, and POWER5p;
- Hypervisor Virtualization: introduced in Power ISA v3.0;
- Hypervisor Doorbell and Event-Based Branch: introduced in
  Power ISA v2.07;
- Critical Input, Watchdog Timer, and Fixed Interval Timer: only defined
  for embedded CPUs;
- Doorbell and Critical Doorbell Interrupt: processor does not implement
  the "Embedded.Processor Control" category;
- Programmable Interval Timer: 40x-only;
- PPC_INTERRUPT_THERM: only raised for 970 and POWER5p;

Signed-off-by: Matheus Ferst 
---
 target/ppc/excp_helper.c | 50 
 1 file changed, 50 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index f32472fb43..de6972d002 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -2046,10 +2046,6 @@ static void p7_deliver_interrupt(CPUPPCState *env, int 
interrupt)
 CPUState *cs = env_cpu(env);
 
 switch (interrupt) {
-case PPC_INTERRUPT_RESET: /* External reset */
-env->pending_interrupts &= ~PPC_INTERRUPT_RESET;
-powerpc_excp(cpu, POWERPC_EXCP_RESET);
-break;
 case PPC_INTERRUPT_MCK: /* Machine check exception */
 env->pending_interrupts &= ~PPC_INTERRUPT_MCK;
 powerpc_excp(cpu, POWERPC_EXCP_MCHECK);
@@ -2066,9 +2062,6 @@ static void p7_deliver_interrupt(CPUPPCState *env, int 
interrupt)
 env->pending_interrupts &= ~PPC_INTERRUPT_HDECR;
 powerpc_excp(cpu, POWERPC_EXCP_HDECR);
 break;
-case PPC_INTERRUPT_HVIRT: /* Hypervisor virtualization interrupt */
-powerpc_excp(cpu, POWERPC_EXCP_HVIRT);
-break;
 
 case PPC_INTERRUPT_EXT:
 if (books_vhyp_promotes_external_to_hvirt(cpu)) {
@@ -2077,60 +2070,17 @@ static void p7_deliver_interrupt(CPUPPCState *env, int 
interrupt)
 powerpc_excp(cpu, POWERPC_EXCP_EXTERNAL);
 }
 break;
-case PPC_INTERRUPT_CEXT: /* External critical interrupt */
-powerpc_excp(cpu, POWERPC_EXCP_CRITICAL);
-break;
 
-case PPC_INTERRUPT_WDT: /* Watchdog timer on embedded PowerPC */
-env->pending_interrupts &= ~PPC_INTERRUPT_WDT;
-powerpc_excp(cpu, POWERPC_EXCP_WDT);
-break;
-case PPC_INTERRUPT_CDOORBELL:
-env->pending_interrupts &= ~PPC_INTERRUPT_CDOORBELL;
-powerpc_excp(cpu, POWERPC_EXCP_DOORCI);
-break;
-case PPC_INTERRUPT_FIT: /* Fixed interval timer on embedded PowerPC */
-env->pending_interrupts &= ~PPC_INTERRUPT_FIT;
-powerpc_excp(cpu, POWERPC_EXCP_FIT);
-break;
-case PPC_INTERRUPT_PIT: /* Programmable interval timer on embedded PowerPC 
*/
-env->pending_interrupts &= ~PPC_INTERRUPT_PIT;
-powerpc_excp(cpu, POWERPC_EXCP_PIT);
-break;
 case PPC_INTERRUPT_DECR: /* Decrementer exception */
 if (ppc_decr_clear_on_delivery(env)) {
 env->pending_interrupts &= ~PPC_INTERRUPT_DECR;
 }
 powerpc_excp(cpu, POWERPC_EXCP_DECR);
 break;
-case PPC_INTERRUPT_DOORBELL:
-env->pending_interrupts &= ~PPC_INTERRUPT_DOORBELL;
-if (is_book3s_arch2x(env)) {
-powerpc_excp(cpu, POWERPC_EXCP_SDOOR);
-} else {
-powerpc_excp(cpu, POWERPC_EXCP_DOORI);
-}
-break;
-case PPC_INTERRUPT_HDOORBELL:
-env->pending_interrupts &= ~PPC_INTERRUPT_HDOORBELL;
-powerpc_excp(cpu, POWERPC_EXCP_SDOOR_HV);
-break;
 case PPC_INTERRUPT_PERFM:
 env->pending_interrupts &= ~PPC_INTERRUPT_PERFM;
 powerpc_excp(cpu, POWERPC_EXCP_PERFM);
 break;
-case PPC_INTERRUPT_THERM:  /* Thermal interrupt */
-env->pending_interrupts &= ~PPC_INTERRUPT_THERM;
-powerpc_excp(cpu, POWERPC_EXCP_THERM);
-break;
-case PPC_INTERRUPT_EBB: /* EBB exception */
-env->pending_interrupts &= ~PPC_INTERRUPT_EBB;
-if (env->spr[SPR_BESCR] & BESCR_PMEO) {
-powerpc_excp(cpu, POWERPC_EXCP_PERFM_EBB);
-} else if (env->spr[SPR_BESCR] & BESCR_EEO) {
-powerpc_excp(cpu, POWERPC_EXCP_EXTERNAL_EBB);
-}
-break;
 case 0:
 /*
  * This is a bug ! It means that has_work took us out of halt without
-- 
2.25.1

[RFC PATCH v2 20/29] target/ppc: remove unused interrupts from p7_pending_interrupt

2022-09-27 Thread Matheus Ferst

Remove the following unused interrupts from the POWER7 interrupt masking
method:
- PPC_INTERRUPT_RESET: only raised for 6xx, 7xx, 970, and POWER5p;
- Hypervisor Virtualization: introduced in Power ISA v3.0;
- Hypervisor Doorbell and Event-Based Branch: introduced in
  Power ISA v2.07;
- Critical Input, Watchdog Timer, and Fixed Interval Timer: only defined
  for embedded CPUs;
- Doorbell and Critical Doorbell Interrupt: processor does not implement
  the "Embedded.Processor Control" category;
- Programmable Interval Timer: 40x-only;
- PPC_INTERRUPT_THERM: only raised for 970 and POWER5p;

Signed-off-by: Matheus Ferst 
---
v2:
  - Remove CDOORBELL and THERM interrupts (farosas);
  - Also remove RESET, DOORBELL, and HDOORBELL interrupts;
  - Assert for the removed interrupts.
---
 target/ppc/excp_helper.c | 63 +---
 1 file changed, 8 insertions(+), 55 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 8e1e18317d..d8522d0b17 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1679,14 +1679,18 @@ void ppc_cpu_do_interrupt(CPUState *cs)
 }
 
 #if defined(TARGET_PPC64)
+#define P7_UNUSED_INTERRUPTS \
+(PPC_INTERRUPT_RESET | PPC_INTERRUPT_HVIRT | PPC_INTERRUPT_CEXT |   \
+ PPC_INTERRUPT_WDT | PPC_INTERRUPT_CDOORBELL | PPC_INTERRUPT_FIT |  \
+ PPC_INTERRUPT_PIT | PPC_INTERRUPT_DOORBELL | PPC_INTERRUPT_HDOORBELL | \
+ PPC_INTERRUPT_THERM | PPC_INTERRUPT_EBB)
+
 static int p7_next_unmasked_interrupt(CPUPPCState *env)
 {
 bool async_deliver;
 
-/* External reset */
-if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
-return PPC_INTERRUPT_RESET;
-}
+assert((env->pending_interrupts & P7_UNUSED_INTERRUPTS) == 0);
+
 /* Machine check exception */
 if (env->pending_interrupts & PPC_INTERRUPT_MCK) {
 return PPC_INTERRUPT_MCK;
@@ -1716,15 +1720,6 @@ static int p7_next_unmasked_interrupt(CPUPPCState *env)
 }
 }
 
-/* Hypervisor virtualization interrupt */
-if (env->pending_interrupts & PPC_INTERRUPT_HVIRT) {
-/* LPCR will be clear when not supported so this will work */
-bool hvice = !!(env->spr[SPR_LPCR] & LPCR_HVICE);
-if ((async_deliver || !FIELD_EX64_HV(env->msr)) && hvice) {
-return PPC_INTERRUPT_HVIRT;
-}
-}
-
 /* External interrupt can ignore MSR:EE under some circumstances */
 if (env->pending_interrupts & PPC_INTERRUPT_EXT) {
 bool lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
@@ -1736,56 +1731,14 @@ static int p7_next_unmasked_interrupt(CPUPPCState *env)
 return PPC_INTERRUPT_EXT;
 }
 }
-if (FIELD_EX64(env->msr, MSR, CE)) {
-/* External critical interrupt */
-if (env->pending_interrupts & PPC_INTERRUPT_CEXT) {
-return PPC_INTERRUPT_CEXT;
-}
-}
 if (async_deliver != 0) {
-/* Watchdog timer on embedded PowerPC */
-if (env->pending_interrupts & PPC_INTERRUPT_WDT) {
-return PPC_INTERRUPT_WDT;
-}
-if (env->pending_interrupts & PPC_INTERRUPT_CDOORBELL) {
-return PPC_INTERRUPT_CDOORBELL;
-}
-/* Fixed interval timer on embedded PowerPC */
-if (env->pending_interrupts & PPC_INTERRUPT_FIT) {
-return PPC_INTERRUPT_FIT;
-}
-/* Programmable interval timer on embedded PowerPC */
-if (env->pending_interrupts & PPC_INTERRUPT_PIT) {
-return PPC_INTERRUPT_PIT;
-}
 /* Decrementer exception */
 if (env->pending_interrupts & PPC_INTERRUPT_DECR) {
 return PPC_INTERRUPT_DECR;
 }
-if (env->pending_interrupts & PPC_INTERRUPT_DOORBELL) {
-return PPC_INTERRUPT_DOORBELL;
-}
-if (env->pending_interrupts & PPC_INTERRUPT_HDOORBELL) {
-return PPC_INTERRUPT_HDOORBELL;
-}
 if (env->pending_interrupts & PPC_INTERRUPT_PERFM) {
 return PPC_INTERRUPT_PERFM;
 }
-/* Thermal interrupt */
-if (env->pending_interrupts & PPC_INTERRUPT_THERM) {
-return PPC_INTERRUPT_THERM;
-}
-/* EBB exception */
-if (env->pending_interrupts & PPC_INTERRUPT_EBB) {
-/*
- * EBB exception must be taken in problem state and
- * with BESCR_GE set.
- */
-if (FIELD_EX64(env->msr, MSR, PR) &&
-(env->spr[SPR_BESCR] & BESCR_GE)) {
-return PPC_INTERRUPT_EBB;
-}
-}
 }
 
 return 0;
-- 
2.25.1

[RFC PATCH v2 12/29] target/ppc: create an interrupt masking method for POWER8

2022-09-27 Thread Matheus Ferst

The new method is identical to ppc_next_unmasked_interrupt_generic,
processor-specific code will be added/removed in the following patches.

Signed-off-by: Matheus Ferst 
---
v2:
  - Renamed the method from ppc_pending_interrupt_p8 to
p8_next_unmasked_interrupt
  - Processor-specific stuff were moved to the following patches to ease
review.
---
 target/ppc/excp_helper.c | 114 +++
 1 file changed, 114 insertions(+)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 5a0d2c11a2..f60d9826d8 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1679,6 +1679,118 @@ void ppc_cpu_do_interrupt(CPUState *cs)
 }
 
 #if defined(TARGET_PPC64)
+static int p8_next_unmasked_interrupt(CPUPPCState *env)
+{
+bool async_deliver;
+
+/* External reset */
+if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
+return PPC_INTERRUPT_RESET;
+}
+/* Machine check exception */
+if (env->pending_interrupts & PPC_INTERRUPT_MCK) {
+return PPC_INTERRUPT_MCK;
+}
+#if 0 /* TODO */
+/* External debug exception */
+if (env->pending_interrupts & PPC_INTERRUPT_DEBUG) {
+return PPC_INTERRUPT_DEBUG;
+}
+#endif
+
+/*
+ * For interrupts that gate on MSR:EE, we need to do something a
+ * bit more subtle, as we need to let them through even when EE is
+ * clear when coming out of some power management states (in order
+ * for them to become a 0x100).
+ */
+async_deliver = FIELD_EX64(env->msr, MSR, EE) || env->resume_as_sreset;
+
+/* Hypervisor decrementer exception */
+if (env->pending_interrupts & PPC_INTERRUPT_HDECR) {
+/* LPCR will be clear when not supported so this will work */
+bool hdice = !!(env->spr[SPR_LPCR] & LPCR_HDICE);
+if ((async_deliver || !FIELD_EX64_HV(env->msr)) && hdice) {
+/* HDEC clears on delivery */
+return PPC_INTERRUPT_HDECR;
+}
+}
+
+/* Hypervisor virtualization interrupt */
+if (env->pending_interrupts & PPC_INTERRUPT_HVIRT) {
+/* LPCR will be clear when not supported so this will work */
+bool hvice = !!(env->spr[SPR_LPCR] & LPCR_HVICE);
+if ((async_deliver || !FIELD_EX64_HV(env->msr)) && hvice) {
+return PPC_INTERRUPT_HVIRT;
+}
+}
+
+/* External interrupt can ignore MSR:EE under some circumstances */
+if (env->pending_interrupts & PPC_INTERRUPT_EXT) {
+bool lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
+bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
+/* HEIC blocks delivery to the hypervisor */
+if ((async_deliver && !(heic && FIELD_EX64_HV(env->msr) &&
+!FIELD_EX64(env->msr, MSR, PR))) ||
+(env->has_hv_mode && !FIELD_EX64_HV(env->msr) && !lpes0)) {
+return PPC_INTERRUPT_EXT;
+}
+}
+if (FIELD_EX64(env->msr, MSR, CE)) {
+/* External critical interrupt */
+if (env->pending_interrupts & PPC_INTERRUPT_CEXT) {
+return PPC_INTERRUPT_CEXT;
+}
+}
+if (async_deliver != 0) {
+/* Watchdog timer on embedded PowerPC */
+if (env->pending_interrupts & PPC_INTERRUPT_WDT) {
+return PPC_INTERRUPT_WDT;
+}
+if (env->pending_interrupts & PPC_INTERRUPT_CDOORBELL) {
+return PPC_INTERRUPT_CDOORBELL;
+}
+/* Fixed interval timer on embedded PowerPC */
+if (env->pending_interrupts & PPC_INTERRUPT_FIT) {
+return PPC_INTERRUPT_FIT;
+}
+/* Programmable interval timer on embedded PowerPC */
+if (env->pending_interrupts & PPC_INTERRUPT_PIT) {
+return PPC_INTERRUPT_PIT;
+}
+/* Decrementer exception */
+if (env->pending_interrupts & PPC_INTERRUPT_DECR) {
+return PPC_INTERRUPT_DECR;
+}
+if (env->pending_interrupts & PPC_INTERRUPT_DOORBELL) {
+return PPC_INTERRUPT_DOORBELL;
+}
+if (env->pending_interrupts & PPC_INTERRUPT_HDOORBELL) {
+return PPC_INTERRUPT_HDOORBELL;
+}
+if (env->pending_interrupts & PPC_INTERRUPT_PERFM) {
+return PPC_INTERRUPT_PERFM;
+}
+/* Thermal interrupt */
+if (env->pending_interrupts & PPC_INTERRUPT_THERM) {
+return PPC_INTERRUPT_THERM;
+}
+/* EBB exception */
+if (env->pending_interrupts & PPC_INTERRUPT_EBB) {
+/*
+ * EBB exception must be taken in problem state and
+ * with BESCR_GE set.
+ */
+if (FIELD_EX64(env->msr, MSR, PR) &&
+(env->spr[SPR_BESCR] & BESCR_GE)) {
+return PPC_INTERRUPT_EBB;
+}
+}
+}
+
+return 0;
+}
+
 #define P9_UNUSED_INTERRUPTS \
 (PPC_INTERRUPT_RESET | PPC_INTERRUPT_DEBUG | PPC_INTERRUPT_CEXT |   \
  PPC_INTERRUPT_WDT | PPC_INTERRUPT_CDOORBELL |

[RFC PATCH v2 21/29] target/ppc: create an interrupt delivery method for POWER7

2022-09-27 Thread Matheus Ferst

The new method is identical to ppc_deliver_interrupt, processor-specific
code will be added/removed in the following patches. No functional
change intended.

Signed-off-by: Matheus Ferst 
---
 target/ppc/excp_helper.c | 113 +++
 1 file changed, 113 insertions(+)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index d8522d0b17..f32472fb43 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -2040,6 +2040,116 @@ static int ppc_next_unmasked_interrupt(CPUPPCState *env)
 }
 
 #if defined(TARGET_PPC64)
+static void p7_deliver_interrupt(CPUPPCState *env, int interrupt)
+{
+PowerPCCPU *cpu = env_archcpu(env);
+CPUState *cs = env_cpu(env);
+
+switch (interrupt) {
+case PPC_INTERRUPT_RESET: /* External reset */
+env->pending_interrupts &= ~PPC_INTERRUPT_RESET;
+powerpc_excp(cpu, POWERPC_EXCP_RESET);
+break;
+case PPC_INTERRUPT_MCK: /* Machine check exception */
+env->pending_interrupts &= ~PPC_INTERRUPT_MCK;
+powerpc_excp(cpu, POWERPC_EXCP_MCHECK);
+break;
+#if 0 /* TODO */
+case PPC_INTERRUPT_DEBUG: /* External debug exception */
+env->pending_interrupts &= ~PPC_INTERRUPT_DEBUG;
+powerpc_excp(cpu, POWERPC_EXCP_DEBUG);
+break;
+#endif
+
+case PPC_INTERRUPT_HDECR: /* Hypervisor decrementer exception */
+/* HDEC clears on delivery */
+env->pending_interrupts &= ~PPC_INTERRUPT_HDECR;
+powerpc_excp(cpu, POWERPC_EXCP_HDECR);
+break;
+case PPC_INTERRUPT_HVIRT: /* Hypervisor virtualization interrupt */
+powerpc_excp(cpu, POWERPC_EXCP_HVIRT);
+break;
+
+case PPC_INTERRUPT_EXT:
+if (books_vhyp_promotes_external_to_hvirt(cpu)) {
+powerpc_excp(cpu, POWERPC_EXCP_HVIRT);
+} else {
+powerpc_excp(cpu, POWERPC_EXCP_EXTERNAL);
+}
+break;
+case PPC_INTERRUPT_CEXT: /* External critical interrupt */
+powerpc_excp(cpu, POWERPC_EXCP_CRITICAL);
+break;
+
+case PPC_INTERRUPT_WDT: /* Watchdog timer on embedded PowerPC */
+env->pending_interrupts &= ~PPC_INTERRUPT_WDT;
+powerpc_excp(cpu, POWERPC_EXCP_WDT);
+break;
+case PPC_INTERRUPT_CDOORBELL:
+env->pending_interrupts &= ~PPC_INTERRUPT_CDOORBELL;
+powerpc_excp(cpu, POWERPC_EXCP_DOORCI);
+break;
+case PPC_INTERRUPT_FIT: /* Fixed interval timer on embedded PowerPC */
+env->pending_interrupts &= ~PPC_INTERRUPT_FIT;
+powerpc_excp(cpu, POWERPC_EXCP_FIT);
+break;
+case PPC_INTERRUPT_PIT: /* Programmable interval timer on embedded PowerPC 
*/
+env->pending_interrupts &= ~PPC_INTERRUPT_PIT;
+powerpc_excp(cpu, POWERPC_EXCP_PIT);
+break;
+case PPC_INTERRUPT_DECR: /* Decrementer exception */
+if (ppc_decr_clear_on_delivery(env)) {
+env->pending_interrupts &= ~PPC_INTERRUPT_DECR;
+}
+powerpc_excp(cpu, POWERPC_EXCP_DECR);
+break;
+case PPC_INTERRUPT_DOORBELL:
+env->pending_interrupts &= ~PPC_INTERRUPT_DOORBELL;
+if (is_book3s_arch2x(env)) {
+powerpc_excp(cpu, POWERPC_EXCP_SDOOR);
+} else {
+powerpc_excp(cpu, POWERPC_EXCP_DOORI);
+}
+break;
+case PPC_INTERRUPT_HDOORBELL:
+env->pending_interrupts &= ~PPC_INTERRUPT_HDOORBELL;
+powerpc_excp(cpu, POWERPC_EXCP_SDOOR_HV);
+break;
+case PPC_INTERRUPT_PERFM:
+env->pending_interrupts &= ~PPC_INTERRUPT_PERFM;
+powerpc_excp(cpu, POWERPC_EXCP_PERFM);
+break;
+case PPC_INTERRUPT_THERM:  /* Thermal interrupt */
+env->pending_interrupts &= ~PPC_INTERRUPT_THERM;
+powerpc_excp(cpu, POWERPC_EXCP_THERM);
+break;
+case PPC_INTERRUPT_EBB: /* EBB exception */
+env->pending_interrupts &= ~PPC_INTERRUPT_EBB;
+if (env->spr[SPR_BESCR] & BESCR_PMEO) {
+powerpc_excp(cpu, POWERPC_EXCP_PERFM_EBB);
+} else if (env->spr[SPR_BESCR] & BESCR_EEO) {
+powerpc_excp(cpu, POWERPC_EXCP_EXTERNAL_EBB);
+}
+break;
+case 0:
+/*
+ * This is a bug ! It means that has_work took us out of halt without
+ * anything to deliver while in a PM state that requires getting
+ * out via a 0x100
+ *
+ * This means we will incorrectly execute past the power management
+ * instruction instead of triggering a reset.
+ *
+ * It generally means a discrepancy between the wakeup conditions in 
the
+ * processor has_work implementation and the logic in this function.
+ */
+assert(env->resume_as_sreset != 0);
+break;
+default:
+cpu_abort(cs, "Invalid PowerPC interrupt %d. Aborting\n", interrupt);
+}
+}
+
 static void p8_deliver_interrupt(CPUPPCState *env, int interrupt)
 {
 PowerPCCPU *cpu = env_archcpu(env);
@@ -2293,6

[RFC PATCH v2 14/29] target/ppc: create an interrupt delivery method for POWER8

2022-09-27 Thread Matheus Ferst

The new method is identical to ppc_deliver_interrupt, processor-specific
code will be added/removed in the following patches. No functional
change intended.

Signed-off-by: Matheus Ferst 
---
 target/ppc/excp_helper.c | 113 +++
 1 file changed, 113 insertions(+)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 6ab03b2e12..0405fc8eee 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1973,6 +1973,116 @@ static int ppc_next_unmasked_interrupt(CPUPPCState *env)
 }
 
 #if defined(TARGET_PPC64)
+static void p8_deliver_interrupt(CPUPPCState *env, int interrupt)
+{
+PowerPCCPU *cpu = env_archcpu(env);
+CPUState *cs = env_cpu(env);
+
+switch (interrupt) {
+case PPC_INTERRUPT_RESET: /* External reset */
+env->pending_interrupts &= ~PPC_INTERRUPT_RESET;
+powerpc_excp(cpu, POWERPC_EXCP_RESET);
+break;
+case PPC_INTERRUPT_MCK: /* Machine check exception */
+env->pending_interrupts &= ~PPC_INTERRUPT_MCK;
+powerpc_excp(cpu, POWERPC_EXCP_MCHECK);
+break;
+#if 0 /* TODO */
+case PPC_INTERRUPT_DEBUG: /* External debug exception */
+env->pending_interrupts &= ~PPC_INTERRUPT_DEBUG;
+powerpc_excp(cpu, POWERPC_EXCP_DEBUG);
+break;
+#endif
+
+case PPC_INTERRUPT_HDECR: /* Hypervisor decrementer exception */
+/* HDEC clears on delivery */
+env->pending_interrupts &= ~PPC_INTERRUPT_HDECR;
+powerpc_excp(cpu, POWERPC_EXCP_HDECR);
+break;
+case PPC_INTERRUPT_HVIRT: /* Hypervisor virtualization interrupt */
+powerpc_excp(cpu, POWERPC_EXCP_HVIRT);
+break;
+
+case PPC_INTERRUPT_EXT:
+if (books_vhyp_promotes_external_to_hvirt(cpu)) {
+powerpc_excp(cpu, POWERPC_EXCP_HVIRT);
+} else {
+powerpc_excp(cpu, POWERPC_EXCP_EXTERNAL);
+}
+break;
+case PPC_INTERRUPT_CEXT: /* External critical interrupt */
+powerpc_excp(cpu, POWERPC_EXCP_CRITICAL);
+break;
+
+case PPC_INTERRUPT_WDT: /* Watchdog timer on embedded PowerPC */
+env->pending_interrupts &= ~PPC_INTERRUPT_WDT;
+powerpc_excp(cpu, POWERPC_EXCP_WDT);
+break;
+case PPC_INTERRUPT_CDOORBELL:
+env->pending_interrupts &= ~PPC_INTERRUPT_CDOORBELL;
+powerpc_excp(cpu, POWERPC_EXCP_DOORCI);
+break;
+case PPC_INTERRUPT_FIT: /* Fixed interval timer on embedded PowerPC */
+env->pending_interrupts &= ~PPC_INTERRUPT_FIT;
+powerpc_excp(cpu, POWERPC_EXCP_FIT);
+break;
+case PPC_INTERRUPT_PIT: /* Programmable interval timer on embedded PowerPC 
*/
+env->pending_interrupts &= ~PPC_INTERRUPT_PIT;
+powerpc_excp(cpu, POWERPC_EXCP_PIT);
+break;
+case PPC_INTERRUPT_DECR: /* Decrementer exception */
+if (ppc_decr_clear_on_delivery(env)) {
+env->pending_interrupts &= ~PPC_INTERRUPT_DECR;
+}
+powerpc_excp(cpu, POWERPC_EXCP_DECR);
+break;
+case PPC_INTERRUPT_DOORBELL:
+env->pending_interrupts &= ~PPC_INTERRUPT_DOORBELL;
+if (is_book3s_arch2x(env)) {
+powerpc_excp(cpu, POWERPC_EXCP_SDOOR);
+} else {
+powerpc_excp(cpu, POWERPC_EXCP_DOORI);
+}
+break;
+case PPC_INTERRUPT_HDOORBELL:
+env->pending_interrupts &= ~PPC_INTERRUPT_HDOORBELL;
+powerpc_excp(cpu, POWERPC_EXCP_SDOOR_HV);
+break;
+case PPC_INTERRUPT_PERFM:
+env->pending_interrupts &= ~PPC_INTERRUPT_PERFM;
+powerpc_excp(cpu, POWERPC_EXCP_PERFM);
+break;
+case PPC_INTERRUPT_THERM:  /* Thermal interrupt */
+env->pending_interrupts &= ~PPC_INTERRUPT_THERM;
+powerpc_excp(cpu, POWERPC_EXCP_THERM);
+break;
+case PPC_INTERRUPT_EBB: /* EBB exception */
+env->pending_interrupts &= ~PPC_INTERRUPT_EBB;
+if (env->spr[SPR_BESCR] & BESCR_PMEO) {
+powerpc_excp(cpu, POWERPC_EXCP_PERFM_EBB);
+} else if (env->spr[SPR_BESCR] & BESCR_EEO) {
+powerpc_excp(cpu, POWERPC_EXCP_EXTERNAL_EBB);
+}
+break;
+case 0:
+/*
+ * This is a bug ! It means that has_work took us out of halt without
+ * anything to deliver while in a PM state that requires getting
+ * out via a 0x100
+ *
+ * This means we will incorrectly execute past the power management
+ * instruction instead of triggering a reset.
+ *
+ * It generally means a discrepancy between the wakeup conditions in 
the
+ * processor has_work implementation and the logic in this function.
+ */
+assert(env->resume_as_sreset != 0);
+break;
+default:
+cpu_abort(cs, "Invalid PowerPC interrupt %d. Aborting\n", interrupt);
+}
+}
+
 static void p9_deliver_interrupt(CPUPPCState *env, int interrupt)
 {
 PowerPCCPU *cpu = env_archcpu(env);
@@ -2167,6

[RFC PATCH v2 11/29] target/ppc: add power-saving interrupt masking logic to p9_next_unmasked_interrupt

2022-09-27 Thread Matheus Ferst

Export p9_interrupt_powersave and use it in p9_next_unmasked_interrupt.

Signed-off-by: Matheus Ferst 
---
Temporarily putting the prototype in internal.h for lack of a better place,
we will un-export p9_interrupt_powersave in future patches.
---
 target/ppc/cpu_init.c|  2 +-
 target/ppc/excp_helper.c | 46 
 target/ppc/internal.h|  4 
 3 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 1f8f6c6ef2..7889158c52 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6351,7 +6351,7 @@ static bool ppc_pvr_match_power9(PowerPCCPUClass *pcc, 
uint32_t pvr, bool best)
 return false;
 }
 
-static int p9_interrupt_powersave(CPUPPCState *env)
+int p9_interrupt_powersave(CPUPPCState *env)
 {
 /* External Exception */
 if ((env->pending_interrupts & PPC_INTERRUPT_EXT) &&
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 67e73f30ab..5a0d2c11a2 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1686,28 +1686,39 @@ void ppc_cpu_do_interrupt(CPUState *cs)
 
 static int p9_next_unmasked_interrupt(CPUPPCState *env)
 {
-bool async_deliver;
+PowerPCCPU *cpu = env_archcpu(env);
+CPUState *cs = CPU(cpu);
+/* Ignore MSR[EE] when coming out of some power management states */
+bool msr_ee = FIELD_EX64(env->msr, MSR, EE) || env->resume_as_sreset;
 
 assert((env->pending_interrupts & P9_UNUSED_INTERRUPTS) == 0);
 
+if (cs->halted) {
+if (env->spr[SPR_PSSCR] & PSSCR_EC) {
+/*
+ * When PSSCR[EC] is set, LPCR[PECE] controls which interrupts can
+ * wakeup the processor
+ */
+return p9_interrupt_powersave(env);
+} else {
+/*
+ * When it's clear, any system-caused exception exits power-saving
+ * mode, even the ones that gate on MSR[EE].
+ */
+msr_ee = true;
+}
+}
+
 /* Machine check exception */
 if (env->pending_interrupts & PPC_INTERRUPT_MCK) {
 return PPC_INTERRUPT_MCK;
 }
 
-/*
- * For interrupts that gate on MSR:EE, we need to do something a
- * bit more subtle, as we need to let them through even when EE is
- * clear when coming out of some power management states (in order
- * for them to become a 0x100).
- */
-async_deliver = FIELD_EX64(env->msr, MSR, EE) || env->resume_as_sreset;
-
 /* Hypervisor decrementer exception */
 if (env->pending_interrupts & PPC_INTERRUPT_HDECR) {
 /* LPCR will be clear when not supported so this will work */
 bool hdice = !!(env->spr[SPR_LPCR] & LPCR_HDICE);
-if ((async_deliver || !FIELD_EX64_HV(env->msr)) && hdice) {
+if ((msr_ee || !FIELD_EX64_HV(env->msr)) && hdice) {
 /* HDEC clears on delivery */
 return PPC_INTERRUPT_HDECR;
 }
@@ -1717,7 +1728,7 @@ static int p9_next_unmasked_interrupt(CPUPPCState *env)
 if (env->pending_interrupts & PPC_INTERRUPT_HVIRT) {
 /* LPCR will be clear when not supported so this will work */
 bool hvice = !!(env->spr[SPR_LPCR] & LPCR_HVICE);
-if ((async_deliver || !FIELD_EX64_HV(env->msr)) && hvice) {
+if ((msr_ee || !FIELD_EX64_HV(env->msr)) && hvice) {
 return PPC_INTERRUPT_HVIRT;
 }
 }
@@ -1727,13 +1738,13 @@ static int p9_next_unmasked_interrupt(CPUPPCState *env)
 bool lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
 bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
 /* HEIC blocks delivery to the hypervisor */
-if ((async_deliver && !(heic && FIELD_EX64_HV(env->msr) &&
+if ((msr_ee && !(heic && FIELD_EX64_HV(env->msr) &&
 !FIELD_EX64(env->msr, MSR, PR))) ||
 (env->has_hv_mode && !FIELD_EX64_HV(env->msr) && !lpes0)) {
 return PPC_INTERRUPT_EXT;
 }
 }
-if (async_deliver != 0) {
+if (msr_ee != 0) {
 /* Decrementer exception */
 if (env->pending_interrupts & PPC_INTERRUPT_DECR) {
 return PPC_INTERRUPT_DECR;
@@ -1895,6 +1906,15 @@ static void p9_deliver_interrupt(CPUPPCState *env, int 
interrupt)
 PowerPCCPU *cpu = env_archcpu(env);
 CPUState *cs = env_cpu(env);
 
+if (cs->halted && !(env->spr[SPR_PSSCR] & PSSCR_EC) &&
+!FIELD_EX64(env->msr, MSR, EE)) {
+/*
+ * A pending interrupt took us out of power-saving, but MSR[EE] says
+ * that we should return to NIP+4 instead of delivering it.
+ */
+return;
+}
+
 switch (interrupt) {
 case PPC_INTERRUPT_MCK: /* Machine check exception */
 env->pending_interrupts &= ~PPC_INTERRUPT_MCK;
diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index 337a362205..41e79adfdb 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -306,4 +306,8 @@ static inline int ger_pack_masks(int pmsk, int

[RFC PATCH v2 16/29] target/ppc: remove generic architecture checks from p8_deliver_interrupt

2022-09-27 Thread Matheus Ferst

No functional change intended.

Signed-off-by: Matheus Ferst 
---
 target/ppc/excp_helper.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 4cbf6b29fc..2e8d4699a9 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1999,9 +1999,6 @@ static void p8_deliver_interrupt(CPUPPCState *env, int 
interrupt)
 break;
 
 case PPC_INTERRUPT_DECR: /* Decrementer exception */
-if (ppc_decr_clear_on_delivery(env)) {
-env->pending_interrupts &= ~PPC_INTERRUPT_DECR;
-}
 powerpc_excp(cpu, POWERPC_EXCP_DECR);
 break;
 case PPC_INTERRUPT_PERFM:
-- 
2.25.1

[RFC PATCH v2 07/29] target/ppc: create an interrupt delivery method for POWER9/POWER10

2022-09-27 Thread Matheus Ferst

The new method is identical to ppc_deliver_interrupt, processor-specific
code will be added/removed in the following patches. No functional
change intended.

Signed-off-by: Matheus Ferst 
---
 target/ppc/excp_helper.c | 118 +++
 1 file changed, 118 insertions(+)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index af2cab01a7..8a32acbc7f 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1889,6 +1889,118 @@ static int ppc_next_unmasked_interrupt(CPUPPCState *env)
 }
 }
 
+#if defined(TARGET_PPC64)
+static void p9_deliver_interrupt(CPUPPCState *env, int interrupt)
+{
+PowerPCCPU *cpu = env_archcpu(env);
+CPUState *cs = env_cpu(env);
+
+switch (interrupt) {
+case PPC_INTERRUPT_RESET: /* External reset */
+env->pending_interrupts &= ~PPC_INTERRUPT_RESET;
+powerpc_excp(cpu, POWERPC_EXCP_RESET);
+break;
+case PPC_INTERRUPT_MCK: /* Machine check exception */
+env->pending_interrupts &= ~PPC_INTERRUPT_MCK;
+powerpc_excp(cpu, POWERPC_EXCP_MCHECK);
+break;
+#if 0 /* TODO */
+case PPC_INTERRUPT_DEBUG: /* External debug exception */
+env->pending_interrupts &= ~PPC_INTERRUPT_DEBUG;
+powerpc_excp(cpu, POWERPC_EXCP_DEBUG);
+break;
+#endif
+
+case PPC_INTERRUPT_HDECR: /* Hypervisor decrementer exception */
+/* HDEC clears on delivery */
+env->pending_interrupts &= ~PPC_INTERRUPT_HDECR;
+powerpc_excp(cpu, POWERPC_EXCP_HDECR);
+break;
+case PPC_INTERRUPT_HVIRT: /* Hypervisor virtualization interrupt */
+powerpc_excp(cpu, POWERPC_EXCP_HVIRT);
+break;
+
+case PPC_INTERRUPT_EXT:
+if (books_vhyp_promotes_external_to_hvirt(cpu)) {
+powerpc_excp(cpu, POWERPC_EXCP_HVIRT);
+} else {
+powerpc_excp(cpu, POWERPC_EXCP_EXTERNAL);
+}
+break;
+case PPC_INTERRUPT_CEXT: /* External critical interrupt */
+powerpc_excp(cpu, POWERPC_EXCP_CRITICAL);
+break;
+
+case PPC_INTERRUPT_WDT: /* Watchdog timer on embedded PowerPC */
+env->pending_interrupts &= ~PPC_INTERRUPT_WDT;
+powerpc_excp(cpu, POWERPC_EXCP_WDT);
+break;
+case PPC_INTERRUPT_CDOORBELL:
+env->pending_interrupts &= ~PPC_INTERRUPT_CDOORBELL;
+powerpc_excp(cpu, POWERPC_EXCP_DOORCI);
+break;
+case PPC_INTERRUPT_FIT: /* Fixed interval timer on embedded PowerPC */
+env->pending_interrupts &= ~PPC_INTERRUPT_FIT;
+powerpc_excp(cpu, POWERPC_EXCP_FIT);
+break;
+case PPC_INTERRUPT_PIT: /* Programmable interval timer on embedded PowerPC 
*/
+env->pending_interrupts &= ~PPC_INTERRUPT_PIT;
+powerpc_excp(cpu, POWERPC_EXCP_PIT);
+break;
+case PPC_INTERRUPT_DECR: /* Decrementer exception */
+if (ppc_decr_clear_on_delivery(env)) {
+env->pending_interrupts &= ~PPC_INTERRUPT_DECR;
+}
+powerpc_excp(cpu, POWERPC_EXCP_DECR);
+break;
+case PPC_INTERRUPT_DOORBELL:
+env->pending_interrupts &= ~PPC_INTERRUPT_DOORBELL;
+if (is_book3s_arch2x(env)) {
+powerpc_excp(cpu, POWERPC_EXCP_SDOOR);
+} else {
+powerpc_excp(cpu, POWERPC_EXCP_DOORI);
+}
+break;
+case PPC_INTERRUPT_HDOORBELL:
+env->pending_interrupts &= ~PPC_INTERRUPT_HDOORBELL;
+powerpc_excp(cpu, POWERPC_EXCP_SDOOR_HV);
+break;
+case PPC_INTERRUPT_PERFM:
+env->pending_interrupts &= ~PPC_INTERRUPT_PERFM;
+powerpc_excp(cpu, POWERPC_EXCP_PERFM);
+break;
+case PPC_INTERRUPT_THERM:  /* Thermal interrupt */
+env->pending_interrupts &= ~PPC_INTERRUPT_THERM;
+powerpc_excp(cpu, POWERPC_EXCP_THERM);
+break;
+case PPC_INTERRUPT_EBB: /* EBB exception */
+env->pending_interrupts &= ~PPC_INTERRUPT_EBB;
+if (env->spr[SPR_BESCR] & BESCR_PMEO) {
+powerpc_excp(cpu, POWERPC_EXCP_PERFM_EBB);
+} else if (env->spr[SPR_BESCR] & BESCR_EEO) {
+powerpc_excp(cpu, POWERPC_EXCP_EXTERNAL_EBB);
+}
+break;
+case 0:
+/*
+ * This is a bug ! It means that has_work took us out of halt without
+ * anything to deliver while in a PM state that requires getting
+ * out via a 0x100
+ *
+ * This means we will incorrectly execute past the power management
+ * instruction instead of triggering a reset.
+ *
+ * It generally means a discrepancy between the wakeup conditions in 
the
+ * processor has_work implementation and the logic in this function.
+ */
+assert(env->resume_as_sreset != 0);
+break;
+default:
+cpu_abort(cs, "Invalid PowerPC interrupt %d. Aborting\n", interrupt);
+}
+}
+#endif
+
 static void ppc_deliver_interrupt_generic(CPUPPCState *env, int interrupt)
 {
 PowerPCCPU *cpu =

[RFC PATCH v2 09/29] target/ppc: remove generic architecture checks from p9_deliver_interrupt

2022-09-27 Thread Matheus Ferst

No functional change intended.

Signed-off-by: Matheus Ferst 
---
 target/ppc/excp_helper.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 603c956588..67e73f30ab 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1919,18 +1919,11 @@ static void p9_deliver_interrupt(CPUPPCState *env, int 
interrupt)
 break;
 
 case PPC_INTERRUPT_DECR: /* Decrementer exception */
-if (ppc_decr_clear_on_delivery(env)) {
-env->pending_interrupts &= ~PPC_INTERRUPT_DECR;
-}
 powerpc_excp(cpu, POWERPC_EXCP_DECR);
 break;
 case PPC_INTERRUPT_DOORBELL:
 env->pending_interrupts &= ~PPC_INTERRUPT_DOORBELL;
-if (is_book3s_arch2x(env)) {
-powerpc_excp(cpu, POWERPC_EXCP_SDOOR);
-} else {
-powerpc_excp(cpu, POWERPC_EXCP_DOORI);
-}
+powerpc_excp(cpu, POWERPC_EXCP_SDOOR);
 break;
 case PPC_INTERRUPT_HDOORBELL:
 env->pending_interrupts &= ~PPC_INTERRUPT_HDOORBELL;
-- 
2.25.1

[RFC PATCH v2 17/29] target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER8

2022-09-27 Thread Matheus Ferst

Move the interrupt masking logic out of cpu_has_work_POWER8 in a new
method, p8_interrupt_powersave, that only returns an interrupt if it can
wake the processor from power-saving mode. No functional change
intended.

Signed-off-by: Matheus Ferst 
---
 target/ppc/cpu_init.c | 61 +++
 1 file changed, 33 insertions(+), 28 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 7889158c52..59e4c325c5 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6133,6 +6133,38 @@ static bool ppc_pvr_match_power8(PowerPCCPUClass *pcc, 
uint32_t pvr, bool best)
 return true;
 }
 
+static int p8_interrupt_powersave(CPUPPCState *env)
+{
+if ((env->pending_interrupts & PPC_INTERRUPT_EXT) &&
+(env->spr[SPR_LPCR] & LPCR_P8_PECE2)) {
+return PPC_INTERRUPT_EXT;
+}
+if ((env->pending_interrupts & PPC_INTERRUPT_DECR) &&
+(env->spr[SPR_LPCR] & LPCR_P8_PECE3)) {
+return PPC_INTERRUPT_DECR;
+}
+if ((env->pending_interrupts & PPC_INTERRUPT_MCK) &&
+(env->spr[SPR_LPCR] & LPCR_P8_PECE4)) {
+return PPC_INTERRUPT_MCK;
+}
+if ((env->pending_interrupts & PPC_INTERRUPT_HMI) &&
+(env->spr[SPR_LPCR] & LPCR_P8_PECE4)) {
+return PPC_INTERRUPT_HMI;
+}
+if ((env->pending_interrupts & PPC_INTERRUPT_DOORBELL) &&
+(env->spr[SPR_LPCR] & LPCR_P8_PECE0)) {
+return PPC_INTERRUPT_DOORBELL;
+}
+if ((env->pending_interrupts & PPC_INTERRUPT_HDOORBELL) &&
+(env->spr[SPR_LPCR] & LPCR_P8_PECE1)) {
+return PPC_INTERRUPT_HDOORBELL;
+}
+if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
+return PPC_INTERRUPT_RESET;
+}
+return 0;
+}
+
 static bool cpu_has_work_POWER8(CPUState *cs)
 {
 PowerPCCPU *cpu = POWERPC_CPU(cs);
@@ -6142,34 +6174,7 @@ static bool cpu_has_work_POWER8(CPUState *cs)
 if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
 return false;
 }
-if ((env->pending_interrupts & PPC_INTERRUPT_EXT) &&
-(env->spr[SPR_LPCR] & LPCR_P8_PECE2)) {
-return true;
-}
-if ((env->pending_interrupts & PPC_INTERRUPT_DECR) &&
-(env->spr[SPR_LPCR] & LPCR_P8_PECE3)) {
-return true;
-}
-if ((env->pending_interrupts & PPC_INTERRUPT_MCK) &&
-(env->spr[SPR_LPCR] & LPCR_P8_PECE4)) {
-return true;
-}
-if ((env->pending_interrupts & PPC_INTERRUPT_HMI) &&
-(env->spr[SPR_LPCR] & LPCR_P8_PECE4)) {
-return true;
-}
-if ((env->pending_interrupts & PPC_INTERRUPT_DOORBELL) &&
-(env->spr[SPR_LPCR] & LPCR_P8_PECE0)) {
-return true;
-}
-if ((env->pending_interrupts & PPC_INTERRUPT_HDOORBELL) &&
-(env->spr[SPR_LPCR] & LPCR_P8_PECE1)) {
-return true;
-}
-if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
-return true;
-}
-return false;
+return p8_interrupt_powersave(env) != 0;
 } else {
 return FIELD_EX64(env->msr, MSR, EE) &&
(cs->interrupt_request & CPU_INTERRUPT_HARD);
-- 
2.25.1

[RFC PATCH v2 13/29] target/ppc: remove unused interrupts from p8_pending_interrupt

2022-09-27 Thread Matheus Ferst

Remove the following unused interrupts from the POWER8 interrupt masking
method:
- PPC_INTERRUPT_RESET: only raised for 6xx, 7xx, 970, and POWER5p;
- Debug Interrupt: removed in Power ISA v2.07;
- Hypervisor Virtualization: introduced in Power ISA v3.0;
- Critical Input, Watchdog Timer, and Fixed Interval Timer: only defined
  for embedded CPUs;
- Hypervisor Doorbell, Doorbell, and Critical Doorbell: processor does
  not implement the "Embedded.Processor Control" category;
- Programmable Interval Timer: 40x-only;
- PPC_INTERRUPT_THERM: only raised for 970 and POWER5p;

Signed-off-by: Matheus Ferst 
---
v2:
  - Remove CDOORBELL and THERM interrupts (farosas);
  - Also remove RESET, DEBUG, DOORBELL, and HDOORBELL interrupts;
  - Assert for the removed interrupts.
---
 target/ppc/excp_helper.c | 58 ++--
 1 file changed, 8 insertions(+), 50 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index f60d9826d8..6ab03b2e12 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1679,24 +1679,22 @@ void ppc_cpu_do_interrupt(CPUState *cs)
 }
 
 #if defined(TARGET_PPC64)
+#define P8_UNUSED_INTERRUPTS \
+(PPC_INTERRUPT_RESET | PPC_INTERRUPT_DEBUG | PPC_INTERRUPT_HVIRT |  \
+PPC_INTERRUPT_CEXT | PPC_INTERRUPT_WDT | PPC_INTERRUPT_CDOORBELL |  \
+PPC_INTERRUPT_FIT | PPC_INTERRUPT_PIT | PPC_INTERRUPT_DOORBELL |\
+PPC_INTERRUPT_HDOORBELL | PPC_INTERRUPT_THERM)
+
 static int p8_next_unmasked_interrupt(CPUPPCState *env)
 {
 bool async_deliver;
 
-/* External reset */
-if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
-return PPC_INTERRUPT_RESET;
-}
+assert((env->pending_interrupts & P8_UNUSED_INTERRUPTS) == 0);
+
 /* Machine check exception */
 if (env->pending_interrupts & PPC_INTERRUPT_MCK) {
 return PPC_INTERRUPT_MCK;
 }
-#if 0 /* TODO */
-/* External debug exception */
-if (env->pending_interrupts & PPC_INTERRUPT_DEBUG) {
-return PPC_INTERRUPT_DEBUG;
-}
-#endif
 
 /*
  * For interrupts that gate on MSR:EE, we need to do something a
@@ -1716,15 +1714,6 @@ static int p8_next_unmasked_interrupt(CPUPPCState *env)
 }
 }
 
-/* Hypervisor virtualization interrupt */
-if (env->pending_interrupts & PPC_INTERRUPT_HVIRT) {
-/* LPCR will be clear when not supported so this will work */
-bool hvice = !!(env->spr[SPR_LPCR] & LPCR_HVICE);
-if ((async_deliver || !FIELD_EX64_HV(env->msr)) && hvice) {
-return PPC_INTERRUPT_HVIRT;
-}
-}
-
 /* External interrupt can ignore MSR:EE under some circumstances */
 if (env->pending_interrupts & PPC_INTERRUPT_EXT) {
 bool lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
@@ -1736,45 +1725,14 @@ static int p8_next_unmasked_interrupt(CPUPPCState *env)
 return PPC_INTERRUPT_EXT;
 }
 }
-if (FIELD_EX64(env->msr, MSR, CE)) {
-/* External critical interrupt */
-if (env->pending_interrupts & PPC_INTERRUPT_CEXT) {
-return PPC_INTERRUPT_CEXT;
-}
-}
 if (async_deliver != 0) {
-/* Watchdog timer on embedded PowerPC */
-if (env->pending_interrupts & PPC_INTERRUPT_WDT) {
-return PPC_INTERRUPT_WDT;
-}
-if (env->pending_interrupts & PPC_INTERRUPT_CDOORBELL) {
-return PPC_INTERRUPT_CDOORBELL;
-}
-/* Fixed interval timer on embedded PowerPC */
-if (env->pending_interrupts & PPC_INTERRUPT_FIT) {
-return PPC_INTERRUPT_FIT;
-}
-/* Programmable interval timer on embedded PowerPC */
-if (env->pending_interrupts & PPC_INTERRUPT_PIT) {
-return PPC_INTERRUPT_PIT;
-}
 /* Decrementer exception */
 if (env->pending_interrupts & PPC_INTERRUPT_DECR) {
 return PPC_INTERRUPT_DECR;
 }
-if (env->pending_interrupts & PPC_INTERRUPT_DOORBELL) {
-return PPC_INTERRUPT_DOORBELL;
-}
-if (env->pending_interrupts & PPC_INTERRUPT_HDOORBELL) {
-return PPC_INTERRUPT_HDOORBELL;
-}
 if (env->pending_interrupts & PPC_INTERRUPT_PERFM) {
 return PPC_INTERRUPT_PERFM;
 }
-/* Thermal interrupt */
-if (env->pending_interrupts & PPC_INTERRUPT_THERM) {
-return PPC_INTERRUPT_THERM;
-}
 /* EBB exception */
 if (env->pending_interrupts & PPC_INTERRUPT_EBB) {
 /*
-- 
2.25.1

[RFC PATCH v2 08/29] target/ppc: remove unused interrupts from p9_deliver_interrupt

2022-09-27 Thread Matheus Ferst

Remove the following unused interrupts from the POWER9 interrupt
processing method:
- PPC_INTERRUPT_RESET: only raised for 6xx, 7xx, 970 and POWER5p;
- Debug Interrupt: removed in Power ISA v2.07;
- Critical Input, Watchdog Timer, and Fixed Interval Timer: only defined
  for embedded CPUs;
- Critical Doorbell Interrupt: removed in Power ISA v3.0;
- Programmable Interval Timer: 40x-only.

Signed-off-by: Matheus Ferst 
---
 target/ppc/excp_helper.c | 33 -
 1 file changed, 33 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 8a32acbc7f..603c956588 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1896,20 +1896,10 @@ static void p9_deliver_interrupt(CPUPPCState *env, int 
interrupt)
 CPUState *cs = env_cpu(env);
 
 switch (interrupt) {
-case PPC_INTERRUPT_RESET: /* External reset */
-env->pending_interrupts &= ~PPC_INTERRUPT_RESET;
-powerpc_excp(cpu, POWERPC_EXCP_RESET);
-break;
 case PPC_INTERRUPT_MCK: /* Machine check exception */
 env->pending_interrupts &= ~PPC_INTERRUPT_MCK;
 powerpc_excp(cpu, POWERPC_EXCP_MCHECK);
 break;
-#if 0 /* TODO */
-case PPC_INTERRUPT_DEBUG: /* External debug exception */
-env->pending_interrupts &= ~PPC_INTERRUPT_DEBUG;
-powerpc_excp(cpu, POWERPC_EXCP_DEBUG);
-break;
-#endif
 
 case PPC_INTERRUPT_HDECR: /* Hypervisor decrementer exception */
 /* HDEC clears on delivery */
@@ -1927,26 +1917,7 @@ static void p9_deliver_interrupt(CPUPPCState *env, int 
interrupt)
 powerpc_excp(cpu, POWERPC_EXCP_EXTERNAL);
 }
 break;
-case PPC_INTERRUPT_CEXT: /* External critical interrupt */
-powerpc_excp(cpu, POWERPC_EXCP_CRITICAL);
-break;
 
-case PPC_INTERRUPT_WDT: /* Watchdog timer on embedded PowerPC */
-env->pending_interrupts &= ~PPC_INTERRUPT_WDT;
-powerpc_excp(cpu, POWERPC_EXCP_WDT);
-break;
-case PPC_INTERRUPT_CDOORBELL:
-env->pending_interrupts &= ~PPC_INTERRUPT_CDOORBELL;
-powerpc_excp(cpu, POWERPC_EXCP_DOORCI);
-break;
-case PPC_INTERRUPT_FIT: /* Fixed interval timer on embedded PowerPC */
-env->pending_interrupts &= ~PPC_INTERRUPT_FIT;
-powerpc_excp(cpu, POWERPC_EXCP_FIT);
-break;
-case PPC_INTERRUPT_PIT: /* Programmable interval timer on embedded PowerPC 
*/
-env->pending_interrupts &= ~PPC_INTERRUPT_PIT;
-powerpc_excp(cpu, POWERPC_EXCP_PIT);
-break;
 case PPC_INTERRUPT_DECR: /* Decrementer exception */
 if (ppc_decr_clear_on_delivery(env)) {
 env->pending_interrupts &= ~PPC_INTERRUPT_DECR;
@@ -1969,10 +1940,6 @@ static void p9_deliver_interrupt(CPUPPCState *env, int 
interrupt)
 env->pending_interrupts &= ~PPC_INTERRUPT_PERFM;
 powerpc_excp(cpu, POWERPC_EXCP_PERFM);
 break;
-case PPC_INTERRUPT_THERM:  /* Thermal interrupt */
-env->pending_interrupts &= ~PPC_INTERRUPT_THERM;
-powerpc_excp(cpu, POWERPC_EXCP_THERM);
-break;
 case PPC_INTERRUPT_EBB: /* EBB exception */
 env->pending_interrupts &= ~PPC_INTERRUPT_EBB;
 if (env->spr[SPR_BESCR] & BESCR_PMEO) {
-- 
2.25.1

[RFC PATCH v2 28/29] target/ppc: unify cpu->has_work based on cs->interrupt_request

2022-09-27 Thread Matheus Ferst

Now that cs->interrupt_request indicates if there is any unmasked
interrupt, checking if the CPU has work to do can be simplified to a
single check that works for all CPU models.

Signed-off-by: Matheus Ferst 
---
 target/ppc/cpu_init.c | 94 +--
 1 file changed, 1 insertion(+), 93 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 26686d1557..4d0064c7a5 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -5984,27 +5984,10 @@ int p7_interrupt_powersave(CPUPPCState *env)
 return 0;
 }
 
-static bool cpu_has_work_POWER7(CPUState *cs)
-{
-PowerPCCPU *cpu = POWERPC_CPU(cs);
-CPUPPCState *env = >env;
-
-if (cs->halted) {
-if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
-return false;
-}
-return p7_interrupt_powersave(env) != 0;
-} else {
-return FIELD_EX64(env->msr, MSR, EE) &&
-   (cs->interrupt_request & CPU_INTERRUPT_HARD);
-}
-}
-
 POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
 PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
-CPUClass *cc = CPU_CLASS(oc);
 
 dc->fw_name = "PowerPC,POWER7";
 dc->desc = "POWER7";
@@ -6013,7 +5996,6 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
 pcc->pcr_supported = PCR_COMPAT_2_06 | PCR_COMPAT_2_05;
 pcc->init_proc = init_proc_POWER7;
 pcc->check_pow = check_pow_nocheck;
-cc->has_work = cpu_has_work_POWER7;
 pcc->insns_flags = PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB |
PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |
PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE |
@@ -6170,27 +6152,10 @@ int p8_interrupt_powersave(CPUPPCState *env)
 return 0;
 }
 
-static bool cpu_has_work_POWER8(CPUState *cs)
-{
-PowerPCCPU *cpu = POWERPC_CPU(cs);
-CPUPPCState *env = >env;
-
-if (cs->halted) {
-if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
-return false;
-}
-return p8_interrupt_powersave(env) != 0;
-} else {
-return FIELD_EX64(env->msr, MSR, EE) &&
-   (cs->interrupt_request & CPU_INTERRUPT_HARD);
-}
-}
-
 POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
 PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
-CPUClass *cc = CPU_CLASS(oc);
 
 dc->fw_name = "PowerPC,POWER8";
 dc->desc = "POWER8";
@@ -6199,7 +6164,6 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
 pcc->pcr_supported = PCR_COMPAT_2_07 | PCR_COMPAT_2_06 | PCR_COMPAT_2_05;
 pcc->init_proc = init_proc_POWER8;
 pcc->check_pow = check_pow_nocheck;
-cc->has_work = cpu_has_work_POWER8;
 pcc->insns_flags = PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB |
PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |
PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE |
@@ -6407,35 +6371,10 @@ int p9_interrupt_powersave(CPUPPCState *env)
 return 0;
 }
 
-static bool cpu_has_work_POWER9(CPUState *cs)
-{
-PowerPCCPU *cpu = POWERPC_CPU(cs);
-CPUPPCState *env = >env;
-
-if (cs->halted) {
-uint64_t psscr = env->spr[SPR_PSSCR];
-
-if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
-return false;
-}
-
-/* If EC is clear, just return true on any pending interrupt */
-if (!(psscr & PSSCR_EC)) {
-return true;
-}
-
-return p9_interrupt_powersave(env) != 0;
-} else {
-return FIELD_EX64(env->msr, MSR, EE) &&
-   (cs->interrupt_request & CPU_INTERRUPT_HARD);
-}
-}
-
 POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
 PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
-CPUClass *cc = CPU_CLASS(oc);
 
 dc->fw_name = "PowerPC,POWER9";
 dc->desc = "POWER9";
@@ -6445,7 +6384,6 @@ POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data)
  PCR_COMPAT_2_05;
 pcc->init_proc = init_proc_POWER9;
 pcc->check_pow = check_pow_nocheck;
-cc->has_work = cpu_has_work_POWER9;
 pcc->insns_flags = PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB |
PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |
PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE |
@@ -6604,35 +6542,10 @@ static bool ppc_pvr_match_power10(PowerPCCPUClass *pcc, 
uint32_t pvr, bool best)
 return false;
 }
 
-static bool cpu_has_work_POWER10(CPUState *cs)
-{
-PowerPCCPU *cpu = POWERPC_CPU(cs);
-CPUPPCState *env = >env;
-
-if (cs->halted) {
-uint64_t psscr = env->spr[SPR_PSSCR];
-
-if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
-return false;
-}
-
-/* If EC is clear, just return true on any pending interrupt */
-if (!(psscr & PSSCR_EC)) {
-return true;
-}
-
-return

[RFC PATCH v2 06/29] target/ppc: remove unused interrupts from p9_pending_interrupt

2022-09-27 Thread Matheus Ferst

Remove the following unused interrupts from the POWER9 interrupt masking
method:
- PPC_INTERRUPT_RESET: only raised for 6xx, 7xx, 970, and POWER5p;
- Debug Interrupt: removed in Power ISA v2.07;
- Critical Input, Watchdog Timer, and Fixed Interval Timer: only defined
  for embedded CPUs;
- Critical Doorbell Interrupt: removed in Power ISA v3.0;
- Programmable Interval Timer: 40x-only.

Signed-off-by: Matheus Ferst 
---
v2:
  - Remove CDOORBELL and THERM (farosas);
  - Also remove RESET and DEBUG, interrupts;
  - Assert for the removed interrupts.
---
 target/ppc/excp_helper.c | 42 +++-
 1 file changed, 7 insertions(+), 35 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index f2b0845735..af2cab01a7 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1679,24 +1679,21 @@ void ppc_cpu_do_interrupt(CPUState *cs)
 }
 
 #if defined(TARGET_PPC64)
+#define P9_UNUSED_INTERRUPTS \
+(PPC_INTERRUPT_RESET | PPC_INTERRUPT_DEBUG | PPC_INTERRUPT_CEXT |   \
+ PPC_INTERRUPT_WDT | PPC_INTERRUPT_CDOORBELL | PPC_INTERRUPT_FIT |  \
+ PPC_INTERRUPT_PIT | PPC_INTERRUPT_THERM)
+
 static int p9_next_unmasked_interrupt(CPUPPCState *env)
 {
 bool async_deliver;
 
-/* External reset */
-if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
-return PPC_INTERRUPT_RESET;
-}
+assert((env->pending_interrupts & P9_UNUSED_INTERRUPTS) == 0);
+
 /* Machine check exception */
 if (env->pending_interrupts & PPC_INTERRUPT_MCK) {
 return PPC_INTERRUPT_MCK;
 }
-#if 0 /* TODO */
-/* External debug exception */
-if (env->pending_interrupts & PPC_INTERRUPT_DEBUG) {
-return PPC_INTERRUPT_DEBUG;
-}
-#endif
 
 /*
  * For interrupts that gate on MSR:EE, we need to do something a
@@ -1736,28 +1733,7 @@ static int p9_next_unmasked_interrupt(CPUPPCState *env)
 return PPC_INTERRUPT_EXT;
 }
 }
-if (FIELD_EX64(env->msr, MSR, CE)) {
-/* External critical interrupt */
-if (env->pending_interrupts & PPC_INTERRUPT_CEXT) {
-return PPC_INTERRUPT_CEXT;
-}
-}
 if (async_deliver != 0) {
-/* Watchdog timer on embedded PowerPC */
-if (env->pending_interrupts & PPC_INTERRUPT_WDT) {
-return PPC_INTERRUPT_WDT;
-}
-if (env->pending_interrupts & PPC_INTERRUPT_CDOORBELL) {
-return PPC_INTERRUPT_CDOORBELL;
-}
-/* Fixed interval timer on embedded PowerPC */
-if (env->pending_interrupts & PPC_INTERRUPT_FIT) {
-return PPC_INTERRUPT_FIT;
-}
-/* Programmable interval timer on embedded PowerPC */
-if (env->pending_interrupts & PPC_INTERRUPT_PIT) {
-return PPC_INTERRUPT_PIT;
-}
 /* Decrementer exception */
 if (env->pending_interrupts & PPC_INTERRUPT_DECR) {
 return PPC_INTERRUPT_DECR;
@@ -1771,10 +1747,6 @@ static int p9_next_unmasked_interrupt(CPUPPCState *env)
 if (env->pending_interrupts & PPC_INTERRUPT_PERFM) {
 return PPC_INTERRUPT_PERFM;
 }
-/* Thermal interrupt */
-if (env->pending_interrupts & PPC_INTERRUPT_THERM) {
-return PPC_INTERRUPT_THERM;
-}
 /* EBB exception */
 if (env->pending_interrupts & PPC_INTERRUPT_EBB) {
 /*
-- 
2.25.1

[RFC PATCH v2 10/29] target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER9

2022-09-27 Thread Matheus Ferst

Move the interrupt masking logic out of cpu_has_work_POWER9 in a new
method, p9_interrupt_powersave, that only returns an interrupt if it can
wake the processor from power-saving mode. No functional change
intended.

Signed-off-by: Matheus Ferst 
---
 target/ppc/cpu_init.c | 126 +-
 1 file changed, 50 insertions(+), 76 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 4b4b3feac9..1f8f6c6ef2 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6351,6 +6351,52 @@ static bool ppc_pvr_match_power9(PowerPCCPUClass *pcc, 
uint32_t pvr, bool best)
 return false;
 }
 
+static int p9_interrupt_powersave(CPUPPCState *env)
+{
+/* External Exception */
+if ((env->pending_interrupts & PPC_INTERRUPT_EXT) &&
+(env->spr[SPR_LPCR] & LPCR_EEE)) {
+bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
+if (!heic || !FIELD_EX64_HV(env->msr) ||
+FIELD_EX64(env->msr, MSR, PR)) {
+return PPC_INTERRUPT_EXT;
+}
+}
+/* Decrementer Exception */
+if ((env->pending_interrupts & PPC_INTERRUPT_DECR) &&
+(env->spr[SPR_LPCR] & LPCR_DEE)) {
+return PPC_INTERRUPT_DECR;
+}
+/* Machine Check or Hypervisor Maintenance Exception */
+if (env->spr[SPR_LPCR] & LPCR_OEE) {
+if (env->pending_interrupts & PPC_INTERRUPT_MCK) {
+return PPC_INTERRUPT_MCK;
+}
+if (env->pending_interrupts & PPC_INTERRUPT_HMI) {
+return PPC_INTERRUPT_HMI;
+}
+}
+/* Privileged Doorbell Exception */
+if ((env->pending_interrupts & PPC_INTERRUPT_DOORBELL) &&
+(env->spr[SPR_LPCR] & LPCR_PDEE)) {
+return PPC_INTERRUPT_DOORBELL;
+}
+/* Hypervisor Doorbell Exception */
+if ((env->pending_interrupts & PPC_INTERRUPT_HDOORBELL) &&
+(env->spr[SPR_LPCR] & LPCR_HDEE)) {
+return PPC_INTERRUPT_HDOORBELL;
+}
+/* Hypervisor virtualization exception */
+if ((env->pending_interrupts & PPC_INTERRUPT_HVIRT) &&
+(env->spr[SPR_LPCR] & LPCR_HVEE)) {
+return PPC_INTERRUPT_HVIRT;
+}
+if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
+return PPC_INTERRUPT_RESET;
+}
+return 0;
+}
+
 static bool cpu_has_work_POWER9(CPUState *cs)
 {
 PowerPCCPU *cpu = POWERPC_CPU(cs);
@@ -6367,44 +6413,8 @@ static bool cpu_has_work_POWER9(CPUState *cs)
 if (!(psscr & PSSCR_EC)) {
 return true;
 }
-/* External Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_EXT) &&
-(env->spr[SPR_LPCR] & LPCR_EEE)) {
-bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
-if (!heic || !FIELD_EX64_HV(env->msr) ||
-FIELD_EX64(env->msr, MSR, PR)) {
-return true;
-}
-}
-/* Decrementer Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_DECR) &&
-(env->spr[SPR_LPCR] & LPCR_DEE)) {
-return true;
-}
-/* Machine Check or Hypervisor Maintenance Exception */
-if ((env->pending_interrupts & (PPC_INTERRUPT_MCK | PPC_INTERRUPT_HMI))
-&& (env->spr[SPR_LPCR] & LPCR_OEE)) {
-return true;
-}
-/* Privileged Doorbell Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_DOORBELL) &&
-(env->spr[SPR_LPCR] & LPCR_PDEE)) {
-return true;
-}
-/* Hypervisor Doorbell Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_HDOORBELL) &&
-(env->spr[SPR_LPCR] & LPCR_HDEE)) {
-return true;
-}
-/* Hypervisor virtualization exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_HVIRT) &&
-(env->spr[SPR_LPCR] & LPCR_HVEE)) {
-return true;
-}
-if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
-return true;
-}
-return false;
+
+return p9_interrupt_powersave(env) != 0;
 } else {
 return FIELD_EX64(env->msr, MSR, EE) &&
(cs->interrupt_request & CPU_INTERRUPT_HARD);
@@ -6600,44 +6610,8 @@ static bool cpu_has_work_POWER10(CPUState *cs)
 if (!(psscr & PSSCR_EC)) {
 return true;
 }
-/* External Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_EXT) &&
-(env->spr[SPR_LPCR] & LPCR_EEE)) {
-bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
-if (!heic || !FIELD_EX64_HV(env->msr) ||
-FIELD_EX64(env->msr, MSR, PR)) {
-return true;
-}
-}
-/* Decrementer Exception */
-if ((env->pending_interrupts & PPC_INTERRUPT_DECR) &&
-(env->spr[SPR_LPCR] & LPCR_DEE)) {
-return true;
-}
-/* Machine Check or Hypervisor Maintenance Exception */
-if ((env->pending_interrupts &

[RFC PATCH v2 03/29] target/ppc: split interrupt masking and delivery from ppc_hw_interrupt

2022-09-27 Thread Matheus Ferst

Split ppc_hw_interrupt into an interrupt masking method,
ppc_next_unmasked_interrupt, and an interrupt processing method,
ppc_deliver_interrupt.

Signed-off-by: Matheus Ferst 
---
v2:
  - ppc_hw_interrupt renamed as ppc_deliver_interrupt (farosas);
  - Handle the "Wakeup from PM state but interrupt Undelivered" case
as an assert in ppc_deliver_interrupt (farosas).
---
 target/ppc/excp_helper.c | 207 +--
 1 file changed, 131 insertions(+), 76 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index c3c30c5d1b..c6381489b6 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1678,29 +1678,22 @@ void ppc_cpu_do_interrupt(CPUState *cs)
 powerpc_excp(cpu, cs->exception_index);
 }
 
-static void ppc_hw_interrupt(CPUPPCState *env)
+static int ppc_next_unmasked_interrupt(CPUPPCState *env)
 {
-PowerPCCPU *cpu = env_archcpu(env);
 bool async_deliver;
 
 /* External reset */
 if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
-env->pending_interrupts &= ~PPC_INTERRUPT_RESET;
-powerpc_excp(cpu, POWERPC_EXCP_RESET);
-return;
+return PPC_INTERRUPT_RESET;
 }
 /* Machine check exception */
 if (env->pending_interrupts & PPC_INTERRUPT_MCK) {
-env->pending_interrupts &= ~PPC_INTERRUPT_MCK;
-powerpc_excp(cpu, POWERPC_EXCP_MCHECK);
-return;
+return PPC_INTERRUPT_MCK;
 }
 #if 0 /* TODO */
 /* External debug exception */
 if (env->pending_interrupts & PPC_INTERRUPT_DEBUG) {
-env->pending_interrupts &= ~PPC_INTERRUPT_DEBUG;
-powerpc_excp(cpu, POWERPC_EXCP_DEBUG);
-return;
+return PPC_INTERRUPT_DEBUG;
 }
 #endif
 
@@ -1718,9 +1711,7 @@ static void ppc_hw_interrupt(CPUPPCState *env)
 bool hdice = !!(env->spr[SPR_LPCR] & LPCR_HDICE);
 if ((async_deliver || !FIELD_EX64_HV(env->msr)) && hdice) {
 /* HDEC clears on delivery */
-env->pending_interrupts &= ~PPC_INTERRUPT_HDECR;
-powerpc_excp(cpu, POWERPC_EXCP_HDECR);
-return;
+return PPC_INTERRUPT_HDECR;
 }
 }
 
@@ -1729,8 +1720,7 @@ static void ppc_hw_interrupt(CPUPPCState *env)
 /* LPCR will be clear when not supported so this will work */
 bool hvice = !!(env->spr[SPR_LPCR] & LPCR_HVICE);
 if ((async_deliver || !FIELD_EX64_HV(env->msr)) && hvice) {
-powerpc_excp(cpu, POWERPC_EXCP_HVIRT);
-return;
+return PPC_INTERRUPT_HVIRT;
 }
 }
 
@@ -1742,77 +1732,47 @@ static void ppc_hw_interrupt(CPUPPCState *env)
 if ((async_deliver && !(heic && FIELD_EX64_HV(env->msr) &&
 !FIELD_EX64(env->msr, MSR, PR))) ||
 (env->has_hv_mode && !FIELD_EX64_HV(env->msr) && !lpes0)) {
-if (books_vhyp_promotes_external_to_hvirt(cpu)) {
-powerpc_excp(cpu, POWERPC_EXCP_HVIRT);
-} else {
-powerpc_excp(cpu, POWERPC_EXCP_EXTERNAL);
-}
-return;
+return PPC_INTERRUPT_EXT;
 }
 }
 if (FIELD_EX64(env->msr, MSR, CE)) {
 /* External critical interrupt */
 if (env->pending_interrupts & PPC_INTERRUPT_CEXT) {
-powerpc_excp(cpu, POWERPC_EXCP_CRITICAL);
-return;
+return PPC_INTERRUPT_CEXT;
 }
 }
 if (async_deliver != 0) {
 /* Watchdog timer on embedded PowerPC */
 if (env->pending_interrupts & PPC_INTERRUPT_WDT) {
-env->pending_interrupts &= ~PPC_INTERRUPT_WDT;
-powerpc_excp(cpu, POWERPC_EXCP_WDT);
-return;
+return PPC_INTERRUPT_WDT;
 }
 if (env->pending_interrupts & PPC_INTERRUPT_CDOORBELL) {
-env->pending_interrupts &= ~PPC_INTERRUPT_CDOORBELL;
-powerpc_excp(cpu, POWERPC_EXCP_DOORCI);
-return;
+return PPC_INTERRUPT_CDOORBELL;
 }
 /* Fixed interval timer on embedded PowerPC */
 if (env->pending_interrupts & PPC_INTERRUPT_FIT) {
-env->pending_interrupts &= ~PPC_INTERRUPT_FIT;
-powerpc_excp(cpu, POWERPC_EXCP_FIT);
-return;
+return PPC_INTERRUPT_FIT;
 }
 /* Programmable interval timer on embedded PowerPC */
 if (env->pending_interrupts & PPC_INTERRUPT_PIT) {
-env->pending_interrupts &= ~PPC_INTERRUPT_PIT;
-powerpc_excp(cpu, POWERPC_EXCP_PIT);
-return;
+return PPC_INTERRUPT_PIT;
 }
 /* Decrementer exception */
 if (env->pending_interrupts & PPC_INTERRUPT_DECR) {
-if (ppc_decr_clear_on_delivery(env)) {
-env->pending_interrupts &= ~PPC_INTERRUPT_DECR;
-}
-powerpc_excp(cpu, POWERPC_EXCP_DECR);
-return;
+return PPC_INTERRUPT_DECR;
 }
 if (env->pending_interrupts &

[RFC PATCH v2 05/29] target/ppc: create an interrupt masking method for POWER9/POWER10

2022-09-27 Thread Matheus Ferst

The new method is identical to ppc_next_unmasked_interrupt_generic,
processor-specific code will be added/removed in the following patches.
No functional change intended.

Signed-off-by: Matheus Ferst 
---
v2:
  - Renamed the method from ppc_pending_interrupt_p9 to
p9_next_unmasked_interrupt
  - Processor-specific stuff were moved to the following patches to ease
review.
---
 target/ppc/excp_helper.c | 119 +++
 1 file changed, 119 insertions(+)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 6da4dba616..f2b0845735 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1678,6 +1678,120 @@ void ppc_cpu_do_interrupt(CPUState *cs)
 powerpc_excp(cpu, cs->exception_index);
 }
 
+#if defined(TARGET_PPC64)
+static int p9_next_unmasked_interrupt(CPUPPCState *env)
+{
+bool async_deliver;
+
+/* External reset */
+if (env->pending_interrupts & PPC_INTERRUPT_RESET) {
+return PPC_INTERRUPT_RESET;
+}
+/* Machine check exception */
+if (env->pending_interrupts & PPC_INTERRUPT_MCK) {
+return PPC_INTERRUPT_MCK;
+}
+#if 0 /* TODO */
+/* External debug exception */
+if (env->pending_interrupts & PPC_INTERRUPT_DEBUG) {
+return PPC_INTERRUPT_DEBUG;
+}
+#endif
+
+/*
+ * For interrupts that gate on MSR:EE, we need to do something a
+ * bit more subtle, as we need to let them through even when EE is
+ * clear when coming out of some power management states (in order
+ * for them to become a 0x100).
+ */
+async_deliver = FIELD_EX64(env->msr, MSR, EE) || env->resume_as_sreset;
+
+/* Hypervisor decrementer exception */
+if (env->pending_interrupts & PPC_INTERRUPT_HDECR) {
+/* LPCR will be clear when not supported so this will work */
+bool hdice = !!(env->spr[SPR_LPCR] & LPCR_HDICE);
+if ((async_deliver || !FIELD_EX64_HV(env->msr)) && hdice) {
+/* HDEC clears on delivery */
+return PPC_INTERRUPT_HDECR;
+}
+}
+
+/* Hypervisor virtualization interrupt */
+if (env->pending_interrupts & PPC_INTERRUPT_HVIRT) {
+/* LPCR will be clear when not supported so this will work */
+bool hvice = !!(env->spr[SPR_LPCR] & LPCR_HVICE);
+if ((async_deliver || !FIELD_EX64_HV(env->msr)) && hvice) {
+return PPC_INTERRUPT_HVIRT;
+}
+}
+
+/* External interrupt can ignore MSR:EE under some circumstances */
+if (env->pending_interrupts & PPC_INTERRUPT_EXT) {
+bool lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
+bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
+/* HEIC blocks delivery to the hypervisor */
+if ((async_deliver && !(heic && FIELD_EX64_HV(env->msr) &&
+!FIELD_EX64(env->msr, MSR, PR))) ||
+(env->has_hv_mode && !FIELD_EX64_HV(env->msr) && !lpes0)) {
+return PPC_INTERRUPT_EXT;
+}
+}
+if (FIELD_EX64(env->msr, MSR, CE)) {
+/* External critical interrupt */
+if (env->pending_interrupts & PPC_INTERRUPT_CEXT) {
+return PPC_INTERRUPT_CEXT;
+}
+}
+if (async_deliver != 0) {
+/* Watchdog timer on embedded PowerPC */
+if (env->pending_interrupts & PPC_INTERRUPT_WDT) {
+return PPC_INTERRUPT_WDT;
+}
+if (env->pending_interrupts & PPC_INTERRUPT_CDOORBELL) {
+return PPC_INTERRUPT_CDOORBELL;
+}
+/* Fixed interval timer on embedded PowerPC */
+if (env->pending_interrupts & PPC_INTERRUPT_FIT) {
+return PPC_INTERRUPT_FIT;
+}
+/* Programmable interval timer on embedded PowerPC */
+if (env->pending_interrupts & PPC_INTERRUPT_PIT) {
+return PPC_INTERRUPT_PIT;
+}
+/* Decrementer exception */
+if (env->pending_interrupts & PPC_INTERRUPT_DECR) {
+return PPC_INTERRUPT_DECR;
+}
+if (env->pending_interrupts & PPC_INTERRUPT_DOORBELL) {
+return PPC_INTERRUPT_DOORBELL;
+}
+if (env->pending_interrupts & PPC_INTERRUPT_HDOORBELL) {
+return PPC_INTERRUPT_HDOORBELL;
+}
+if (env->pending_interrupts & PPC_INTERRUPT_PERFM) {
+return PPC_INTERRUPT_PERFM;
+}
+/* Thermal interrupt */
+if (env->pending_interrupts & PPC_INTERRUPT_THERM) {
+return PPC_INTERRUPT_THERM;
+}
+/* EBB exception */
+if (env->pending_interrupts & PPC_INTERRUPT_EBB) {
+/*
+ * EBB exception must be taken in problem state and
+ * with BESCR_GE set.
+ */
+if (FIELD_EX64(env->msr, MSR, PR) &&
+(env->spr[SPR_BESCR] & BESCR_GE)) {
+return PPC_INTERRUPT_EBB;
+}
+}
+}
+
+return 0;
+}
+#endif
+
 static int ppc_next_unmasked_interrupt_generic(CPUPPCState *env)
 {
 bool

[RFC PATCH v2 04/29] target/ppc: prepare to split interrupt masking and delivery by excp_model

2022-09-27 Thread Matheus Ferst

No functional change intended.

Signed-off-by: Matheus Ferst 
---
v2:
  - Use "generic" instead of "legacy" to name the original methods (farosas).
---
 target/ppc/excp_helper.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index c6381489b6..6da4dba616 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1678,7 +1678,7 @@ void ppc_cpu_do_interrupt(CPUState *cs)
 powerpc_excp(cpu, cs->exception_index);
 }
 
-static int ppc_next_unmasked_interrupt(CPUPPCState *env)
+static int ppc_next_unmasked_interrupt_generic(CPUPPCState *env)
 {
 bool async_deliver;
 
@@ -1790,7 +1790,15 @@ static int ppc_next_unmasked_interrupt(CPUPPCState *env)
 return 0;
 }
 
-static void ppc_deliver_interrupt(CPUPPCState *env, int interrupt)
+static int ppc_next_unmasked_interrupt(CPUPPCState *env)
+{
+switch (env->excp_model) {
+default:
+return ppc_next_unmasked_interrupt_generic(env);
+}
+}
+
+static void ppc_deliver_interrupt_generic(CPUPPCState *env, int interrupt)
 {
 PowerPCCPU *cpu = env_archcpu(env);
 CPUState *cs = env_cpu(env);
@@ -1900,6 +1908,14 @@ static void ppc_deliver_interrupt(CPUPPCState *env, int 
interrupt)
 }
 }
 
+static void ppc_deliver_interrupt(CPUPPCState *env, int interrupt)
+{
+switch (env->excp_model) {
+default:
+ppc_deliver_interrupt_generic(env, interrupt);
+}
+}
+
 void ppc_cpu_do_system_reset(CPUState *cs)
 {
 PowerPCCPU *cpu = POWERPC_CPU(cs);
-- 
2.25.1

[RFC PATCH v2 02/29] target/ppc: always use ppc_set_irq to set env->pending_interrupts

2022-09-27 Thread Matheus Ferst

Use ppc_set_irq to raise/clear interrupts to ensure CPU_INTERRUPT_HARD
will be set/reset accordingly.

Signed-off-by: Matheus Ferst 
---
 target/ppc/excp_helper.c | 17 +++--
 target/ppc/misc_helper.c |  9 ++---
 2 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 3f8ff9bcf3..c3c30c5d1b 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -23,6 +23,7 @@
 #include "exec/exec-all.h"
 #include "internal.h"
 #include "helper_regs.h"
+#include "hw/ppc/ppc.h"
 
 #include "trace.h"
 
@@ -2080,7 +2081,6 @@ void helper_rfebb(CPUPPCState *env, target_ulong s)
 static void do_ebb(CPUPPCState *env, int ebb_excp)
 {
 PowerPCCPU *cpu = env_archcpu(env);
-CPUState *cs = CPU(cpu);
 
 /*
  * FSCR_EBB and FSCR_IC_EBB are the same bits used with
@@ -2098,8 +2098,7 @@ static void do_ebb(CPUPPCState *env, int ebb_excp)
 if (FIELD_EX64(env->msr, MSR, PR)) {
 powerpc_excp(cpu, ebb_excp);
 } else {
-env->pending_interrupts |= PPC_INTERRUPT_EBB;
-cpu_interrupt(cs, CPU_INTERRUPT_HARD);
+ppc_set_irq(cpu, PPC_INTERRUPT_EBB, 1);
 }
 }
 
@@ -2292,7 +2291,7 @@ void helper_msgclr(CPUPPCState *env, target_ulong rb)
 return;
 }
 
-env->pending_interrupts &= ~irq;
+ppc_set_irq(env_archcpu(env), irq, 0);
 }
 
 void helper_msgsnd(target_ulong rb)
@@ -2311,8 +2310,7 @@ void helper_msgsnd(target_ulong rb)
 CPUPPCState *cenv = >env;
 
 if ((rb & DBELL_BRDCAST) || (cenv->spr[SPR_BOOKE_PIR] == pir)) {
-cenv->pending_interrupts |= irq;
-cpu_interrupt(cs, CPU_INTERRUPT_HARD);
+ppc_set_irq(cpu, irq, 1);
 }
 }
 qemu_mutex_unlock_iothread();
@@ -2336,7 +2334,7 @@ void helper_book3s_msgclr(CPUPPCState *env, target_ulong 
rb)
 return;
 }
 
-env->pending_interrupts &= ~PPC_INTERRUPT_HDOORBELL;
+ppc_set_irq(env_archcpu(env), PPC_INTERRUPT_HDOORBELL, 0);
 }
 
 static void book3s_msgsnd_common(int pir, int irq)
@@ -2350,8 +2348,7 @@ static void book3s_msgsnd_common(int pir, int irq)
 
 /* TODO: broadcast message to all threads of the same  processor */
 if (cenv->spr_cb[SPR_PIR].default_value == pir) {
-cenv->pending_interrupts |= irq;
-cpu_interrupt(cs, CPU_INTERRUPT_HARD);
+ppc_set_irq(cpu, irq, 1);
 }
 }
 qemu_mutex_unlock_iothread();
@@ -2377,7 +2374,7 @@ void helper_book3s_msgclrp(CPUPPCState *env, target_ulong 
rb)
 return;
 }
 
-env->pending_interrupts &= ~PPC_INTERRUPT_DOORBELL;
+ppc_set_irq(env_archcpu(env), PPC_INTERRUPT_HDOORBELL, 0);
 }
 
 /*
diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
index 05e35572bc..a9bc1522e2 100644
--- a/target/ppc/misc_helper.c
+++ b/target/ppc/misc_helper.c
@@ -25,6 +25,7 @@
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "mmu-book3s-v3.h"
+#include "hw/ppc/ppc.h"
 
 #include "helper_regs.h"
 
@@ -173,7 +174,6 @@ target_ulong helper_load_dpdes(CPUPPCState *env)
 void helper_store_dpdes(CPUPPCState *env, target_ulong val)
 {
 PowerPCCPU *cpu = env_archcpu(env);
-CPUState *cs = CPU(cpu);
 
 helper_hfscr_facility_check(env, HFSCR_MSGP, "store DPDES", HFSCR_IC_MSGP);
 
@@ -184,12 +184,7 @@ void helper_store_dpdes(CPUPPCState *env, target_ulong val)
 return;
 }
 
-if (val & 0x1) {
-env->pending_interrupts |= PPC_INTERRUPT_DOORBELL;
-cpu_interrupt(cs, CPU_INTERRUPT_HARD);
-} else {
-env->pending_interrupts &= ~PPC_INTERRUPT_DOORBELL;
-}
+ppc_set_irq(cpu, PPC_INTERRUPT_DOORBELL, val & 0x1);
 }
 #endif /* defined(TARGET_PPC64) */
 
-- 
2.25.1

[RFC PATCH v2 00/29] PowerPC interrupt rework

2022-09-27 Thread Matheus Ferst

Link to v1: https://lists.gnu.org/archive/html/qemu-ppc/2022-08/msg00370.html
This series is also available as a git branch: 
https://github.com/PPC64/qemu/tree/ferst-interrupt-fix-v2

This version addresses Fabiano's feedback and fixes some issues found
with the tests suggested by Cédric. While working on it, I found two
intermittent problems on master:

 i) ~10% of boots with pSeries and 970/970mp/POWER5+ hard lockup after
either SCSI or network initialization when using -smp 4. With
-smp 2, the problem is harder to reproduce but still happens, and I
couldn't reproduce with thread=single.
ii) ~52% of KVM guest initializations on PowerNV hang in different parts
of the boot process when using more than one CPU.

With the complete series applied, I couldn't reproduce (i) anymore, and
(ii) became a little more frequent (~58%).

I've tested each patch of this series with [1], modified to use -smp for
machines that support more than one CPU. The machines I can currently
boot with FreeBSD (970/970,p/POWER5+/POWER7/POWER8/POWER9 pSeries,
POWER8/POWER9 PowerNV, and mpc8544ds) were tested with the images from
[2] and still boot after applying the patch series. Booting nested
guests inside a TCG pSeries machine also seems to be working fine.

Using command lines like:

./qemu-system-ppc64 -M powernv9 -cpu POWER9 -accel tcg,thread=multi \
-m 8G -smp $SMP -vga none -nographic -kernel zImage \
-append 'console=hvc0' -initrdootfs.cpio.xz \
-serial pipe:pipe -monitor unix:mon,server,nowait

and

./qemu-system-ppc64 -M pseries -cpu POWER9 -accel tcg,thread=multi \
-m 8G -smp $SMP -vga none -nographic -kernel zImage \
-append 'console=hvc0' -initrd rootfs.cpio.xz \
-serial pipe:pipe -monitor unix:mon,server,nowait

to measure the time to boot, login, and shut down a compressed kernel
with a buildroot initramfs, with 100 iteration we get:

+-+--+-+
| |PowerNV   |   pSeries   |
|-smp |--+-+
| | master| patch series |master| patch series |
+-+--+-+
|  1  |  45,84 ± 0,92 | 38,08 ± 0,66 | 23,56 ± 1,16 | 23,76 ± 1,04 |
|  2  |  80,21 ± 8,03 | 40,81 ± 0,45 | 26,59 ± 0,92 | 26,88 ± 0,99 |
|  4  | 115,98 ± 9,85 | 38,80 ± 0,44 | 28,83 ± 0,84 | 28,46 ± 0,94 |
|  6  | 199,14 ± 6,36 | 39,32 ± 0,50 | 29,22 ± 0,78 | 29,45 ± 0,86 |
|  8  | 47,85 ± 27,50 | 38,98 ± 0,49 | 29,63 ± 0,80 | 29,60 ± 0,78 |
+-+--+-+

This results shows that the problem reported in [3] is solved, while
pSeries boot time is essentially unchanged.

With a non-compressed kernel, the difference with PowerNV is smaller,
and pSeries stills the same:

+-+--+-+
| |PowerNV   |   pSeries   |
|-smp |--+-+
| | master| patch series |master| patch series |
+-+--+-+
|  1  |  42,17 ± 0,92 | 38,13 ± 0,59 | 23,15 ± 1,02 | 23,46 ± 1,02 |
|  2  |  55,72 ± 3,54 | 40,30 ± 0,56 | 26,26 ± 0,82 | 26,38 ± 0,80 |
|  4  |  67,09 ± 3,02 | 38,26 ± 0,47 | 28,36 ± 0,77 | 28,19 ± 0,78 |
|  6  |  98,96 ± 2,49 | 39,01 ± 0,38 | 28,68 ± 0,75 | 29,02 ± 0,88 |
|  8  |  39,68 ± 0,42 | 38,44 ± 0,41 | 29,24 ± 0,81 | 29,44 ± 0,75 |
+-+--+-+

Finally, using command lines like

./qemu-system-ppc64 -M powernv9 -cpu POWER9 -accel tcg,thread=multi \
-m 8G -smp 4 -device virtio-scsi-pci -boot c -vga none -nographic \
-device nvme,bus=pcie.2,addr=0x0,drive=drive0,serial=1234 \
-drive file=rootfs.ext2,if=none,id=drive0,format=raw,cache=none \
-snapshot -serial pipe:pipe -monitor unix:mon,server,nowait \
-kernel zImage -append 'console=hvc0 rootwait root=/dev/nvme0n1' \
-device virtio-net-pci,netdev=br0,mac=52:54:00:12:34:57,bus=pcie.0 \
-netdev bridge,id=br0

and

./qemu-system-ppc64 -M pseries -cpu POWER9 -accel tcg,thread=multi \
-m 8G -smp 4 -device virtio-scsi-pci -boot c -vga none -nographic \
-drive file=rootfs.ext2,if=scsi,index=0,format=raw -snapshot \
-kernel zImage -append 'console=hvc0 rootwait root=/dev/sda' \
-serial pipe:pipe -monitor unix:mon,server,nowait \
-device virtio-net-pci,netdev=br0,mac=52:54:00:12:34:57 \
-netdev bridge,id=br0

to tests IO performance, with iperf to test network and a 4Gb scp
transfer to test disk+network, in 100 iterations we saw:

+-+---+-+
| |scp (s)|   iperf (MB/s)  |
+-+---+-+
|PowerNV master   |

[RFC PATCH v2 01/29] target/ppc: define PPC_INTERRUPT_* values directly

2022-09-27 Thread Matheus Ferst

This enum defines the bit positions in env->pending_interrupts for each
interrupt. However, except for the comparison in kvmppc_set_interrupt,
the values are always used as (1 << PPC_INTERRUPT_*). Define them
directly like that to save some clutter. No functional change intended.

Reviewed-by: David Gibson 
Signed-off-by: Matheus Ferst 
---
 hw/ppc/ppc.c | 10 +++---
 hw/ppc/trace-events  |  2 +-
 target/ppc/cpu.h | 40 +++---
 target/ppc/cpu_init.c| 56 +++---
 target/ppc/excp_helper.c | 74 
 target/ppc/misc_helper.c |  6 ++--
 6 files changed, 94 insertions(+), 94 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 690f448cb9..77e611e81c 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -40,7 +40,7 @@
 static void cpu_ppc_tb_stop (CPUPPCState *env);
 static void cpu_ppc_tb_start (CPUPPCState *env);
 
-void ppc_set_irq(PowerPCCPU *cpu, int n_IRQ, int level)
+void ppc_set_irq(PowerPCCPU *cpu, int irq, int level)
 {
 CPUState *cs = CPU(cpu);
 CPUPPCState *env = >env;
@@ -56,21 +56,21 @@ void ppc_set_irq(PowerPCCPU *cpu, int n_IRQ, int level)
 old_pending = env->pending_interrupts;
 
 if (level) {
-env->pending_interrupts |= 1 << n_IRQ;
+env->pending_interrupts |= irq;
 cpu_interrupt(cs, CPU_INTERRUPT_HARD);
 } else {
-env->pending_interrupts &= ~(1 << n_IRQ);
+env->pending_interrupts &= ~irq;
 if (env->pending_interrupts == 0) {
 cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD);
 }
 }
 
 if (old_pending != env->pending_interrupts) {
-kvmppc_set_interrupt(cpu, n_IRQ, level);
+kvmppc_set_interrupt(cpu, irq, level);
 }
 
 
-trace_ppc_irq_set_exit(env, n_IRQ, level, env->pending_interrupts,
+trace_ppc_irq_set_exit(env, irq, level, env->pending_interrupts,
CPU(cpu)->interrupt_request);
 
 if (locked) {
diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events
index a07d5aca0f..956938ebcd 100644
--- a/hw/ppc/trace-events
+++ b/hw/ppc/trace-events
@@ -127,7 +127,7 @@ ppc40x_set_tb_clk(uint32_t value) "new frequency %" PRIu32
 ppc40x_timers_init(uint32_t value) "frequency %" PRIu32
 
 ppc_irq_set(void *env, uint32_t pin, uint32_t level) "env [%p] pin %d level %d"
-ppc_irq_set_exit(void *env, uint32_t n_IRQ, uint32_t level, uint32_t pending, 
uint32_t request) "env [%p] n_IRQ %d level %d => pending 0x%08" PRIx32 " req 
0x%08" PRIx32
+ppc_irq_set_exit(void *env, uint32_t irq, uint32_t level, uint32_t pending, 
uint32_t request) "env [%p] irq 0x%05" PRIx32 " level %d => pending 0x%08" 
PRIx32 " req 0x%08" PRIx32
 ppc_irq_set_state(const char *name, uint32_t level) "\"%s\" level %d"
 ppc_irq_reset(const char *name) "%s"
 ppc_irq_cpu(const char *action) "%s"
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 7f73e2ac81..9ccd23db04 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -2416,27 +2416,27 @@ enum {
 /* Hardware exceptions definitions */
 enum {
 /* External hardware exception sources */
-PPC_INTERRUPT_RESET = 0,  /* Reset exception  */
-PPC_INTERRUPT_WAKEUP, /* Wakeup exception */
-PPC_INTERRUPT_MCK,/* Machine check exception  */
-PPC_INTERRUPT_EXT,/* External interrupt   */
-PPC_INTERRUPT_SMI,/* System management interrupt  */
-PPC_INTERRUPT_CEXT,   /* Critical external interrupt  */
-PPC_INTERRUPT_DEBUG,  /* External debug exception */
-PPC_INTERRUPT_THERM,  /* Thermal exception*/
+PPC_INTERRUPT_RESET = 0x1,  /* Reset exception
*/
+PPC_INTERRUPT_WAKEUP= 0x2,  /* Wakeup exception   
*/
+PPC_INTERRUPT_MCK   = 0x4,  /* Machine check exception
*/
+PPC_INTERRUPT_EXT   = 0x8,  /* External interrupt 
*/
+PPC_INTERRUPT_SMI   = 0x00010,  /* System management interrupt
*/
+PPC_INTERRUPT_CEXT  = 0x00020,  /* Critical external interrupt
*/
+PPC_INTERRUPT_DEBUG = 0x00040,  /* External debug exception   
*/
+PPC_INTERRUPT_THERM = 0x00080,  /* Thermal exception  
*/
 /* Internal hardware exception sources */
-PPC_INTERRUPT_DECR,   /* Decrementer exception*/
-PPC_INTERRUPT_HDECR,  /* Hypervisor decrementer exception */
-PPC_INTERRUPT_PIT,/* Programmable interval timer interrupt */
-PPC_INTERRUPT_FIT,/* Fixed interval timer interrupt   */
-PPC_INTERRUPT_WDT,/* Watchdog timer interrupt */
-PPC_INTERRUPT_CDOORBELL,  /* Critical doorbell interrupt  */
-PPC_INTERRUPT_DOORBELL,   /* Doorbell interrupt   */
-

Re: [Virtio-fs] virtiofsd: Any reason why there's not an "openat2" sandbox mode?

2022-09-27 Thread Stefan Hajnoczi

On Tue, Sep 27, 2022 at 01:51:41PM -0400, Colin Walters wrote:
> 
> 
> On Tue, Sep 27, 2022, at 1:27 PM, German Maglione wrote:
> >
> >> > Now all the development has moved to rust virtiofsd.
> 
> Oh, awesome!!  The code there looks great.
> 
> > I could work on this for the next major version and see if anything breaks.
> > But I prefer to add this as a compilation feature, instead of a command line
> > option that we will then have to maintain for a while.
> 
> Hmm, what would be the issue with having the code there by default?  I think 
> rather than any new command line option, we automatically use 
> `openat2+RESOLVE_IN_ROOT` if the process is run as a nonzero uid.
> 
> > Also, I don't see it as a sandbox feature, as Stefan mentioned, a 
> > compromised
> > process can call openat2() without RESOLVE_IN_ROOT. 
> 
> I'm a bit skeptical honestly about how secure the existing namespace code is 
> against a compromised virtiofsd process.  The primary worry is guest 
> filesystem traversals, right?  openat2+RESOLVE_IN_ROOT addresses that.  Plus 
> being in Rust makes this dramatically safer.
> 
> > I did some test with
> > Landlock to lock virtiofsd inside the shared directory, but IIRC it 
> > requires a
> > kernel 5.13
> 
> But yes, landlock and other things make sense, I just don't see these things 
> as strongly linked.  IOW we shouldn't in my opinion block unprivileged 
> virtiofsd on more sandboxing than openat2 already gives us.

I think openat2(RESOLVE_IN_ROOT) support should be added unless there is
another unprivileged mechanism that is stronger.

The security implications need to be covered in the user documentation
so people can decide whether using this mode is appropriate.

We should continue to explain the difference between a voluntary
mechanism like openat2(RESOLVE_IN_ROOT) and a mandatory mechanism like
mount namespaces with pivot_root(2). Rust programs are not immune to
arbitrary code execution, but it's less likely than with a C program.

Stefan


signature.asc
Description: PGP signature

[PATCH v5 12/12] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint

2022-09-27 Thread Stefan Hajnoczi

Register guest RAM using BlockRAMRegistrar and set the
BDRV_REQ_REGISTERED_BUF flag so block drivers can optimize memory
accesses in I/O requests.

This is for vdpa-blk, vhost-user-blk, and other I/O interfaces that rely
on DMA mapping/unmapping.

Signed-off-by: Stefan Hajnoczi 
---
 include/hw/virtio/virtio-blk.h |  2 ++
 hw/block/virtio-blk.c  | 39 ++
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index d311c57cca..7f589b4146 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -19,6 +19,7 @@
 #include "hw/block/block.h"
 #include "sysemu/iothread.h"
 #include "sysemu/block-backend.h"
+#include "sysemu/block-ram-registrar.h"
 #include "qom/object.h"
 
 #define TYPE_VIRTIO_BLK "virtio-blk-device"
@@ -64,6 +65,7 @@ struct VirtIOBlock {
 struct VirtIOBlockDataPlane *dataplane;
 uint64_t host_features;
 size_t config_size;
+BlockRAMRegistrar blk_ram_registrar;
 };
 
 typedef struct VirtIOBlockReq {
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index e9ba752f6b..907f012c45 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -21,6 +21,7 @@
 #include "hw/block/block.h"
 #include "hw/qdev-properties.h"
 #include "sysemu/blockdev.h"
+#include "sysemu/block-ram-registrar.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/runstate.h"
 #include "hw/virtio/virtio-blk.h"
@@ -384,12 +385,14 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
 }
 }
 
-static inline void submit_requests(BlockBackend *blk, MultiReqBuffer *mrb,
+static inline void submit_requests(VirtIOBlock *s, MultiReqBuffer *mrb,
int start, int num_reqs, int niov)
 {
+BlockBackend *blk = s->blk;
 QEMUIOVector *qiov = >reqs[start]->qiov;
 int64_t sector_num = mrb->reqs[start]->sector_num;
 bool is_write = mrb->is_write;
+BdrvRequestFlags flags = 0;
 
 if (num_reqs > 1) {
 int i;
@@ -420,12 +423,18 @@ static inline void submit_requests(BlockBackend *blk, 
MultiReqBuffer *mrb,
   num_reqs - 1);
 }
 
+if (blk_ram_registrar_ok(>blk_ram_registrar)) {
+flags |= BDRV_REQ_REGISTERED_BUF;
+}
+
 if (is_write) {
-blk_aio_pwritev(blk, sector_num << BDRV_SECTOR_BITS, qiov, 0,
-virtio_blk_rw_complete, mrb->reqs[start]);
+blk_aio_pwritev(blk, sector_num << BDRV_SECTOR_BITS, qiov,
+flags, virtio_blk_rw_complete,
+mrb->reqs[start]);
 } else {
-blk_aio_preadv(blk, sector_num << BDRV_SECTOR_BITS, qiov, 0,
-   virtio_blk_rw_complete, mrb->reqs[start]);
+blk_aio_preadv(blk, sector_num << BDRV_SECTOR_BITS, qiov,
+   flags, virtio_blk_rw_complete,
+   mrb->reqs[start]);
 }
 }
 
@@ -447,14 +456,14 @@ static int multireq_compare(const void *a, const void *b)
 }
 }
 
-static void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
+static void virtio_blk_submit_multireq(VirtIOBlock *s, MultiReqBuffer *mrb)
 {
 int i = 0, start = 0, num_reqs = 0, niov = 0, nb_sectors = 0;
 uint32_t max_transfer;
 int64_t sector_num = 0;
 
 if (mrb->num_reqs == 1) {
-submit_requests(blk, mrb, 0, 1, -1);
+submit_requests(s, mrb, 0, 1, -1);
 mrb->num_reqs = 0;
 return;
 }
@@ -474,11 +483,11 @@ static void virtio_blk_submit_multireq(BlockBackend *blk, 
MultiReqBuffer *mrb)
  * 3. merge would exceed maximum transfer length of backend device
  */
 if (sector_num + nb_sectors != req->sector_num ||
-niov > blk_get_max_iov(blk) - req->qiov.niov ||
+niov > blk_get_max_iov(s->blk) - req->qiov.niov ||
 req->qiov.size > max_transfer ||
 nb_sectors > (max_transfer -
   req->qiov.size) / BDRV_SECTOR_SIZE) {
-submit_requests(blk, mrb, start, num_reqs, niov);
+submit_requests(s, mrb, start, num_reqs, niov);
 num_reqs = 0;
 }
 }
@@ -494,7 +503,7 @@ static void virtio_blk_submit_multireq(BlockBackend *blk, 
MultiReqBuffer *mrb)
 num_reqs++;
 }
 
-submit_requests(blk, mrb, start, num_reqs, niov);
+submit_requests(s, mrb, start, num_reqs, niov);
 mrb->num_reqs = 0;
 }
 
@@ -509,7 +518,7 @@ static void virtio_blk_handle_flush(VirtIOBlockReq *req, 
MultiReqBuffer *mrb)
  * Make sure all outstanding writes are posted to the backing device.
  */
 if (mrb->is_write && mrb->num_reqs > 0) {
-virtio_blk_submit_multireq(s->blk, mrb);
+virtio_blk_submit_multireq(s, mrb);
 }
 blk_aio_flush(s->blk, virtio_blk_flush_complete, req);
 }
@@ -689,7 +698,7 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req,

[PATCH v5 07/12] block: return errors from bdrv_register_buf()

2022-09-27 Thread Stefan Hajnoczi

Registering an I/O buffer is only a performance optimization hint but it
is still necessary to return errors when it fails.

Later patches will need to detect errors when registering buffers but an
immediate advantage is that error_report() calls are no longer needed in
block driver .bdrv_register_buf() functions.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/block-global-state.h  |  5 ++-
 include/block/block_int-common.h|  5 ++-
 include/sysemu/block-backend-global-state.h |  2 +-
 block/block-backend.c   |  4 +--
 block/io.c  | 34 +++--
 block/nvme.c| 18 +--
 qemu-img.c  |  2 +-
 7 files changed, 52 insertions(+), 18 deletions(-)

diff --git a/include/block/block-global-state.h 
b/include/block/block-global-state.h
index 7901f35863..eba4ed23b4 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -246,8 +246,11 @@ void bdrv_del_child(BlockDriverState *parent, BdrvChild 
*child, Error **errp);
  *
  * Buffers must not overlap and they must be unregistered with the same  values that they were registered with.
+ *
+ * Returns: true on success, false on failure
  */
-void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size);
+bool bdrv_register_buf(BlockDriverState *bs, void *host, size_t size,
+   Error **errp);
 void bdrv_unregister_buf(BlockDriverState *bs, void *host, size_t size);
 
 void bdrv_cancel_in_flight(BlockDriverState *bs);
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 19798d0e77..9c569be162 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -433,8 +433,11 @@ struct BlockDriver {
  * that it can do IOMMU mapping with VFIO etc., in order to get better
  * performance. In the case of VFIO drivers, this callback is used to do
  * DMA mapping for hot buffers.
+ *
+ * Returns: true on success, false on failure
  */
-void (*bdrv_register_buf)(BlockDriverState *bs, void *host, size_t size);
+bool (*bdrv_register_buf)(BlockDriverState *bs, void *host, size_t size,
+  Error **errp);
 void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host, size_t size);
 
 /*
diff --git a/include/sysemu/block-backend-global-state.h 
b/include/sysemu/block-backend-global-state.h
index 97f7dad2c3..6858e39cb6 100644
--- a/include/sysemu/block-backend-global-state.h
+++ b/include/sysemu/block-backend-global-state.h
@@ -106,7 +106,7 @@ void blk_io_limits_enable(BlockBackend *blk, const char 
*group);
 void blk_io_limits_update_group(BlockBackend *blk, const char *group);
 void blk_set_force_allow_inactivate(BlockBackend *blk);
 
-void blk_register_buf(BlockBackend *blk, void *host, size_t size);
+bool blk_register_buf(BlockBackend *blk, void *host, size_t size, Error 
**errp);
 void blk_unregister_buf(BlockBackend *blk, void *host, size_t size);
 
 const BdrvChild *blk_root(BlockBackend *blk);
diff --git a/block/block-backend.c b/block/block-backend.c
index 99141f8f06..34399d3b7b 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2545,10 +2545,10 @@ static void blk_root_drained_end(BdrvChild *child, int 
*drained_end_counter)
 }
 }
 
-void blk_register_buf(BlockBackend *blk, void *host, size_t size)
+bool blk_register_buf(BlockBackend *blk, void *host, size_t size, Error **errp)
 {
 GLOBAL_STATE_CODE();
-bdrv_register_buf(blk_bs(blk), host, size);
+return bdrv_register_buf(blk_bs(blk), host, size, errp);
 }
 
 void blk_unregister_buf(BlockBackend *blk, void *host, size_t size)
diff --git a/block/io.c b/block/io.c
index de04594299..1c3392c7f6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3307,17 +3307,45 @@ void bdrv_io_unplug(BlockDriverState *bs)
 }
 }
 
-void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size)
+/* Helper that undoes bdrv_register_buf() when it fails partway through */
+static void bdrv_register_buf_rollback(BlockDriverState *bs,
+   void *host,
+   size_t size,
+   BdrvChild *final_child)
+{
+BdrvChild *child;
+
+QLIST_FOREACH(child, >children, next) {
+if (child == final_child) {
+break;
+}
+
+bdrv_unregister_buf(child->bs, host, size);
+}
+
+if (bs->drv && bs->drv->bdrv_unregister_buf) {
+bs->drv->bdrv_unregister_buf(bs, host, size);
+}
+}
+
+bool bdrv_register_buf(BlockDriverState *bs, void *host, size_t size,
+   Error **errp)
 {
 BdrvChild *child;
 
 GLOBAL_STATE_CODE();
 if (bs->drv && bs->drv->bdrv_register_buf) {
-bs->drv->bdrv_register_buf(bs, host, size);
+if (!bs->drv->bdrv_register_buf(bs, host, size, errp)) {
+return false;
+}

Re: [PATCH 1/1] 9pfs: avoid iterator invalidation in v9fs_mark_fids_unreclaim

2022-09-27 Thread Greg Kurz

On Tue, 27 Sep 2022 19:14:33 +0200
Christian Schoenebeck  wrote:

> On Dienstag, 27. September 2022 15:05:13 CEST Linus Heckemann wrote:
> > Christian Schoenebeck  writes:
> > > Ah, you sent this fix as a separate patch on top. I actually just meant
> > > that you would take my already queued patch as the latest version (just
> > > because I had made some minor changes on my end) and adjust that patch
> > > further as v4.
> > > 
> > > Anyway, there are still some things to do here, so maybe you can send your
> > > patch squashed in the next round ...
> > 
> > I see, will do!
> > 
> > >> @Christian: I still haven't been able to reproduce the issue that this
> > >> commit is supposed to fix (I tried building KDE too, no problems), so
> > >> it's a bit of a shot in the dark. It certainly still runs and I think it
> > >> should fix the issue, but it would be great if you could test it.
> > > 
> > > No worries about reproduction, I will definitely test this thoroughly. ;-)
> > > 
> > >>  hw/9pfs/9p.c | 46 ++
> > >>  1 file changed, 30 insertions(+), 16 deletions(-)
> > >> 
> > >> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > >> index f4c1e37202..825c39e122 100644
> > >> --- a/hw/9pfs/9p.c
> > >> +++ b/hw/9pfs/9p.c
> > >> @@ -522,33 +522,47 @@ static int coroutine_fn
> > >> v9fs_mark_fids_unreclaim(V9fsPDU *pdu, V9fsPath *path) V9fsFidState
> > >> *fidp;
> > >> 
> > >>  gpointer fid;
> > >>  GHashTableIter iter;
> > >> 
> > >> +/*
> > >> + * The most common case is probably that we have exactly one
> > >> + * fid for the given path, so preallocate exactly one.
> > >> + */
> > >> +GArray *to_reopen = g_array_sized_new(FALSE, FALSE,
> > >> sizeof(V9fsFidState*), 1); +gint i;
> > > 
> > > Please use `g_autoptr(GArray)` instead of `GArray *`, that avoids the need
> > > for explicit calls to g_array_free() below.
> > 
> > Good call. I'm not familiar with glib, so I didn't know about this :)
> > 
> > >> -fidp->flags |= FID_NON_RECLAIMABLE;
> > > 
> > > Why did you remove that? It should still be marked as FID_NON_RECLAIMABLE,
> > > no?
> > Indeed, that was an accident.
> > 
> > >> +/*
> > >> + * Ensure the fid survives a potential clunk request during
> > >> + * v9fs_reopen_fid or put_fid.
> > >> + */
> > >> +fidp->ref++;
> > > 
> > > Hmm, bumping the refcount here makes sense, as the 2nd loop may be
> > > interrupted and the fid otherwise disappear in between, but ...
> > > 
> > >> +g_array_append_val(to_reopen, fidp);
> > >> 
> > >>  }
> > >> 
> > >> +}
> > >> 
> > >> -/* We're done with this fid */
> > >> +for (i=0; i < to_reopen->len; i++) {
> > >> +fidp = g_array_index(to_reopen, V9fsFidState*, i);
> > >> +/* reopen the file/dir if already closed */
> > >> +err = v9fs_reopen_fid(pdu, fidp);
> > >> +if (err < 0) {
> > >> +put_fid(pdu, fidp);
> > >> +g_array_free(to_reopen, TRUE);
> > >> +return err;
> > > 
> > > ... this return would then leak all remainder fids that you have bumped
> > > the
> > > refcount for above already.
> > 
> > You're right. I think the best way around it, though it feels ugly, is
> > to add a third loop in an "out:".
> 
> Either that, or continuing the loop to the end. Not that this would become 
> much prettier. I must admit I also don't really have a good idea for a clean 
> solution in this case.
> 
> > > Also: I noticed that your changes in virtfs_reset() would need the same
> > > 2-loop hack to avoid hash iterator invalidation, as it would also call
> > > put_fid() inside the loop and be prone for hash iterator invalidation
> > > otherwise.
> > Good point. Will do.
> > 
> > One more thing has occurred to me. I think the reclaiming/reopening
> > logic will misbehave in the following sequence of events:
> > 
> > 1. QEMU reclaims an open fid, losing the file handle
> > 2. The file referred to by the fid is replaced with a different file
> >(e.g. via rename or symlink) outside QEMU
> > 3. The file is accessed again by the guest, causing QEMU to reopen a
> >_different file_ from before without the guest having performed any
> >operations that should cause this to happen.
> > 
> > This is neither introduced nor resolved by my changes. Am I overlooking
> > something that avoids this (be it documentation that directories exposed
> > via 9p should not be touched by the host), or is this a real issue? I'm
> > thinking one could at least detect it by saving inode numbers in
> > V9fsFidState and comparing them when reopening, but recovering from such
> > a situation seems difficult.
> 
> Well, in that specific scenario when rename/move happens outside of QEMU then 
> yes, this might happen unfortunately. The point of this "reclaim fid" stuff 
> is 
> to deal with the fact that there is an upper limit on systems for the max. 
> amount of

[PATCH v5 10/12] stubs: add qemu_ram_block_from_host() and qemu_ram_get_fd()

2022-09-27 Thread Stefan Hajnoczi

The blkio block driver will need to look up the file descriptor for a
given pointer. This is possible in softmmu builds where the RAMBlock API
is available for querying guest RAM.

Add stubs so tools like qemu-img that link the block layer still build
successfully. In this case there is no guest RAM but that is fine.
Bounce buffers and their file descriptors will be allocated with
libblkio's blkio_alloc_mem_region() so we won't rely on QEMU's
qemu_ram_get_fd() in that case.

Signed-off-by: Stefan Hajnoczi 
---
 stubs/physmem.c   | 13 +
 stubs/meson.build |  1 +
 2 files changed, 14 insertions(+)
 create mode 100644 stubs/physmem.c

diff --git a/stubs/physmem.c b/stubs/physmem.c
new file mode 100644
index 00..1fc5f2df29
--- /dev/null
+++ b/stubs/physmem.c
@@ -0,0 +1,13 @@
+#include "qemu/osdep.h"
+#include "exec/cpu-common.h"
+
+RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
+   ram_addr_t *offset)
+{
+return NULL;
+}
+
+int qemu_ram_get_fd(RAMBlock *rb)
+{
+return -1;
+}
diff --git a/stubs/meson.build b/stubs/meson.build
index d8f3fd5c44..4314161f5f 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -29,6 +29,7 @@ stub_ss.add(files('migr-blocker.c'))
 stub_ss.add(files('module-opts.c'))
 stub_ss.add(files('monitor.c'))
 stub_ss.add(files('monitor-core.c'))
+stub_ss.add(files('physmem.c'))
 stub_ss.add(files('qemu-timer-notify-cb.c'))
 stub_ss.add(files('qmp_memory_device.c'))
 stub_ss.add(files('qmp-command-available.c'))
-- 
2.37.3

[PATCH v5 11/12] blkio: implement BDRV_REQ_REGISTERED_BUF optimization

2022-09-27 Thread Stefan Hajnoczi

Avoid bounce buffers when QEMUIOVector elements are within previously
registered bdrv_register_buf() buffers.

The idea is that emulated storage controllers will register guest RAM
using bdrv_register_buf() and set the BDRV_REQ_REGISTERED_BUF on I/O
requests. Therefore no blkio_map_mem_region() calls are necessary in the
performance-critical I/O code path.

This optimization doesn't apply if the I/O buffer is internally
allocated by QEMU (e.g. qcow2 metadata). There we still take the slow
path because BDRV_REQ_REGISTERED_BUF is not set.

Signed-off-by: Stefan Hajnoczi 
---
 block/blkio.c | 174 +-
 1 file changed, 171 insertions(+), 3 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 9244a653ef..ed6ec7f167 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -11,9 +11,13 @@
 #include "qemu/osdep.h"
 #include 
 #include "block/block_int.h"
+#include "exec/memory.h"
+#include "exec/cpu-common.h" /* for qemu_ram_get_fd() */
 #include "qapi/error.h"
+#include "qemu/error-report.h"
 #include "qapi/qmp/qdict.h"
 #include "qemu/module.h"
+#include "exec/memory.h" /* for ram_block_discard_disable() */
 
 /*
  * Keep the QEMU BlockDriver names identical to the libblkio driver names.
@@ -72,6 +76,12 @@ typedef struct {
 
 /* Can we skip adding/deleting blkio_mem_regions? */
 bool needs_mem_regions;
+
+/* Are file descriptors necessary for blkio_mem_regions? */
+bool needs_mem_region_fd;
+
+/* Are madvise(MADV_DONTNEED)-style operations unavailable? */
+bool mem_regions_pinned;
 } BDRVBlkioState;
 
 /* Called with s->bounce_lock held */
@@ -346,7 +356,8 @@ blkio_co_preadv(BlockDriverState *bs, int64_t offset, 
int64_t bytes,
 .coroutine = qemu_coroutine_self(),
 };
 BDRVBlkioState *s = bs->opaque;
-bool use_bounce_buffer = s->needs_mem_regions;
+bool use_bounce_buffer =
+s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
 BlkioBounceBuf bounce;
 struct iovec *iov = qiov->iov;
 int iovcnt = qiov->niov;
@@ -389,7 +400,8 @@ static int coroutine_fn blkio_co_pwritev(BlockDriverState 
*bs, int64_t offset,
 .coroutine = qemu_coroutine_self(),
 };
 BDRVBlkioState *s = bs->opaque;
-bool use_bounce_buffer = s->needs_mem_regions;
+bool use_bounce_buffer =
+s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
 BlkioBounceBuf bounce;
 struct iovec *iov = qiov->iov;
 int iovcnt = qiov->niov;
@@ -472,6 +484,117 @@ static void blkio_io_unplug(BlockDriverState *bs)
 }
 }
 
+typedef enum {
+BMRR_OK,
+BMRR_SKIP,
+BMRR_FAIL,
+} BlkioMemRegionResult;
+
+/*
+ * Produce a struct blkio_mem_region for a given address and size.
+ *
+ * This function produces identical results when called multiple times with the
+ * same arguments. This property is necessary because blkio_unmap_mem_region()
+ * must receive the same struct blkio_mem_region field values that were passed
+ * to blkio_map_mem_region().
+ */
+static BlkioMemRegionResult
+blkio_mem_region_from_host(BlockDriverState *bs,
+   void *host, size_t size,
+   struct blkio_mem_region *region,
+   Error **errp)
+{
+BDRVBlkioState *s = bs->opaque;
+int fd = -1;
+ram_addr_t fd_offset = 0;
+
+if (((uintptr_t)host | size) % s->mem_region_alignment) {
+error_setg(errp, "unaligned buf %p with size %zu", host, size);
+return BMRR_FAIL;
+}
+
+/* Attempt to find the fd for the underlying memory */
+if (s->needs_mem_region_fd) {
+RAMBlock *ram_block;
+RAMBlock *end_block;
+ram_addr_t offset;
+
+/*
+ * bdrv_register_buf() is called with the BQL held so mr lives at least
+ * until this function returns.
+ */
+ram_block = qemu_ram_block_from_host(host, false, _offset);
+if (ram_block) {
+fd = qemu_ram_get_fd(ram_block);
+}
+if (fd == -1) {
+/*
+ * Ideally every RAMBlock would have an fd. pc-bios and other
+ * things don't. Luckily they are usually not I/O buffers and we
+ * can just ignore them.
+ */
+return BMRR_SKIP;
+}
+
+/* Make sure the fd covers the entire range */
+end_block = qemu_ram_block_from_host(host + size - 1, false, );
+if (ram_block != end_block) {
+error_setg(errp, "registered buffer at %p with size %zu extends "
+   "beyond RAMBlock", host, size);
+return BMRR_FAIL;
+}
+}
+
+*region = (struct blkio_mem_region){
+.addr = host,
+.len = size,
+.fd = fd,
+.fd_offset = fd_offset,
+};
+return BMRR_OK;
+}
+
+static bool blkio_register_buf(BlockDriverState *bs, void *host, size_t size,
+   Error **errp)
+{
+BDRVBlkioState *s = bs->opaque;
+struct

[PATCH v5 05/12] block: use BdrvRequestFlags type for supported flag fields

2022-09-27 Thread Stefan Hajnoczi

Use the enum type so GDB displays the enum members instead of printing a
numeric constant.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/block_int-common.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index b7a7cbd3a5..19798d0e77 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -1051,7 +1051,7 @@ struct BlockDriverState {
 /*
  * Flags honored during pread
  */
-unsigned int supported_read_flags;
+BdrvRequestFlags supported_read_flags;
 /*
  * Flags honored during pwrite (so far: BDRV_REQ_FUA,
  * BDRV_REQ_WRITE_UNCHANGED).
@@ -1069,12 +1069,12 @@ struct BlockDriverState {
  * flag), or they have to explicitly take the WRITE permission for
  * their children.
  */
-unsigned int supported_write_flags;
+BdrvRequestFlags supported_write_flags;
 /*
  * Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
  * BDRV_REQ_MAY_UNMAP, BDRV_REQ_WRITE_UNCHANGED)
  */
-unsigned int supported_zero_flags;
+BdrvRequestFlags supported_zero_flags;
 /*
  * Flags honoured during truncate (so far: BDRV_REQ_ZERO_WRITE).
  *
@@ -1082,7 +1082,7 @@ struct BlockDriverState {
  * that any added space reads as all zeros. If this can't be guaranteed,
  * the operation must fail.
  */
-unsigned int supported_truncate_flags;
+BdrvRequestFlags supported_truncate_flags;
 
 /* the following member gives a name to every node on the bs graph. */
 char node_name[32];
-- 
2.37.3

[PATCH v5 04/12] block: pass size to bdrv_unregister_buf()

2022-09-27 Thread Stefan Hajnoczi

The only implementor of bdrv_register_buf() is block/nvme.c, where the
size is not needed when unregistering a buffer. This is because
util/vfio-helpers.c can look up mappings by address.

Future block drivers that implement bdrv_register_buf() may not be able
to do their job given only the buffer address. Add a size argument to
bdrv_unregister_buf().

Also document the assumptions about
bdrv_register_buf()/bdrv_unregister_buf() calls. The same 
values that were given to bdrv_register_buf() must be given to
bdrv_unregister_buf().

gcc 11.2.1 emits a spurious warning that img_bench()'s buf_size local
variable might be uninitialized, so it's necessary to silence the
compiler.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/block-global-state.h  | 5 -
 include/block/block_int-common.h| 2 +-
 include/sysemu/block-backend-global-state.h | 2 +-
 block/block-backend.c   | 4 ++--
 block/io.c  | 6 +++---
 block/nvme.c| 2 +-
 qemu-img.c  | 4 ++--
 7 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/include/block/block-global-state.h 
b/include/block/block-global-state.h
index 21265e3966..7901f35863 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -243,9 +243,12 @@ void bdrv_del_child(BlockDriverState *parent, BdrvChild 
*child, Error **errp);
  * Register/unregister a buffer for I/O. For example, VFIO drivers are
  * interested to know the memory areas that would later be used for I/O, so
  * that they can prepare IOMMU mapping etc., to get better performance.
+ *
+ * Buffers must not overlap and they must be unregistered with the same  values that they were registered with.
  */
 void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size);
-void bdrv_unregister_buf(BlockDriverState *bs, void *host);
+void bdrv_unregister_buf(BlockDriverState *bs, void *host, size_t size);
 
 void bdrv_cancel_in_flight(BlockDriverState *bs);
 
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 8947abab76..b7a7cbd3a5 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -435,7 +435,7 @@ struct BlockDriver {
  * DMA mapping for hot buffers.
  */
 void (*bdrv_register_buf)(BlockDriverState *bs, void *host, size_t size);
-void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host);
+void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host, size_t size);
 
 /*
  * This field is modified only under the BQL, and is part of
diff --git a/include/sysemu/block-backend-global-state.h 
b/include/sysemu/block-backend-global-state.h
index 415f0c91d7..97f7dad2c3 100644
--- a/include/sysemu/block-backend-global-state.h
+++ b/include/sysemu/block-backend-global-state.h
@@ -107,7 +107,7 @@ void blk_io_limits_update_group(BlockBackend *blk, const 
char *group);
 void blk_set_force_allow_inactivate(BlockBackend *blk);
 
 void blk_register_buf(BlockBackend *blk, void *host, size_t size);
-void blk_unregister_buf(BlockBackend *blk, void *host);
+void blk_unregister_buf(BlockBackend *blk, void *host, size_t size);
 
 const BdrvChild *blk_root(BlockBackend *blk);
 
diff --git a/block/block-backend.c b/block/block-backend.c
index d4a5df2ac2..99141f8f06 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2551,10 +2551,10 @@ void blk_register_buf(BlockBackend *blk, void *host, 
size_t size)
 bdrv_register_buf(blk_bs(blk), host, size);
 }
 
-void blk_unregister_buf(BlockBackend *blk, void *host)
+void blk_unregister_buf(BlockBackend *blk, void *host, size_t size)
 {
 GLOBAL_STATE_CODE();
-bdrv_unregister_buf(blk_bs(blk), host);
+bdrv_unregister_buf(blk_bs(blk), host, size);
 }
 
 int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
diff --git a/block/io.c b/block/io.c
index 0a8cbefe86..af85fa4bcc 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3305,16 +3305,16 @@ void bdrv_register_buf(BlockDriverState *bs, void 
*host, size_t size)
 }
 }
 
-void bdrv_unregister_buf(BlockDriverState *bs, void *host)
+void bdrv_unregister_buf(BlockDriverState *bs, void *host, size_t size)
 {
 BdrvChild *child;
 
 GLOBAL_STATE_CODE();
 if (bs->drv && bs->drv->bdrv_unregister_buf) {
-bs->drv->bdrv_unregister_buf(bs, host);
+bs->drv->bdrv_unregister_buf(bs, host, size);
 }
 QLIST_FOREACH(child, >children, next) {
-bdrv_unregister_buf(child->bs, host);
+bdrv_unregister_buf(child->bs, host, size);
 }
 }
 
diff --git a/block/nvme.c b/block/nvme.c
index 01fb28aa63..696502acea 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -1592,7 +1592,7 @@ static void nvme_register_buf(BlockDriverState *bs, void 
*host, size_t size)
 }
 }
 
-static void nvme_unregister_buf(BlockDriverState *bs, void *host)
+static void nvme_unregister_buf(BlockDriverState *bs, void *host, size_t

[PATCH v5 09/12] exec/cpu-common: add qemu_ram_get_fd()

2022-09-27 Thread Stefan Hajnoczi

Add a function to get the file descriptor for a RAMBlock. Device
emulation code typically uses the MemoryRegion APIs but vhost-style code
may use RAMBlock directly for sharing guest memory with another process.

This new API will be used by the libblkio block driver so it can share
guest memory via .bdrv_register_buf().

Signed-off-by: Stefan Hajnoczi 
---
 include/exec/cpu-common.h | 1 +
 softmmu/physmem.c | 5 +
 2 files changed, 6 insertions(+)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index d909429427..5bd73a9db5 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -91,6 +91,7 @@ void qemu_ram_set_uf_zeroable(RAMBlock *rb);
 bool qemu_ram_is_migratable(RAMBlock *rb);
 void qemu_ram_set_migratable(RAMBlock *rb);
 void qemu_ram_unset_migratable(RAMBlock *rb);
+int qemu_ram_get_fd(RAMBlock *rb);
 
 size_t qemu_ram_pagesize(RAMBlock *block);
 size_t qemu_ram_pagesize_largest(void);
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 56e03e07b5..d9578ccfd4 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1748,6 +1748,11 @@ void qemu_ram_unset_migratable(RAMBlock *rb)
 rb->flags &= ~RAM_MIGRATABLE;
 }
 
+int qemu_ram_get_fd(RAMBlock *rb)
+{
+return rb->fd;
+}
+
 /* Called with iothread lock held.  */
 void qemu_ram_set_idstr(RAMBlock *new_block, const char *name, DeviceState 
*dev)
 {
-- 
2.37.3

[PATCH v5 08/12] block: add BlockRAMRegistrar

2022-09-27 Thread Stefan Hajnoczi

Emulated devices and other BlockBackend users wishing to take advantage
of blk_register_buf() all have the same repetitive job: register
RAMBlocks with the BlockBackend using RAMBlockNotifier.

Add a BlockRAMRegistrar API to do this. A later commit will use this
from hw/block/virtio-blk.c.

Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS  |  1 +
 include/sysemu/block-ram-registrar.h | 37 +++
 block/block-ram-registrar.c  | 54 
 block/meson.build|  1 +
 4 files changed, 93 insertions(+)
 create mode 100644 include/sysemu/block-ram-registrar.h
 create mode 100644 block/block-ram-registrar.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 878005f65b..1863724fa5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2498,6 +2498,7 @@ F: block*
 F: block/
 F: hw/block/
 F: include/block/
+F: include/sysemu/block-*.h
 F: qemu-img*
 F: docs/tools/qemu-img.rst
 F: qemu-io*
diff --git a/include/sysemu/block-ram-registrar.h 
b/include/sysemu/block-ram-registrar.h
new file mode 100644
index 00..d8b2f7942b
--- /dev/null
+++ b/include/sysemu/block-ram-registrar.h
@@ -0,0 +1,37 @@
+/*
+ * BlockBackend RAM Registrar
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef BLOCK_RAM_REGISTRAR_H
+#define BLOCK_RAM_REGISTRAR_H
+
+#include "exec/ramlist.h"
+
+/**
+ * struct BlockRAMRegistrar:
+ *
+ * Keeps RAMBlock memory registered with a BlockBackend using
+ * blk_register_buf() including hotplugged memory.
+ *
+ * Emulated devices or other BlockBackend users initialize a BlockRAMRegistrar
+ * with blk_ram_registrar_init() before submitting I/O requests with the
+ * BDRV_REQ_REGISTERED_BUF flag set.
+ */
+typedef struct {
+BlockBackend *blk;
+RAMBlockNotifier notifier;
+bool ok;
+} BlockRAMRegistrar;
+
+void blk_ram_registrar_init(BlockRAMRegistrar *r, BlockBackend *blk);
+void blk_ram_registrar_destroy(BlockRAMRegistrar *r);
+
+/* Have all RAMBlocks been registered successfully? */
+static inline bool blk_ram_registrar_ok(BlockRAMRegistrar *r)
+{
+return r->ok;
+}
+
+#endif /* BLOCK_RAM_REGISTRAR_H */
diff --git a/block/block-ram-registrar.c b/block/block-ram-registrar.c
new file mode 100644
index 00..32935006c1
--- /dev/null
+++ b/block/block-ram-registrar.c
@@ -0,0 +1,54 @@
+/*
+ * BlockBackend RAM Registrar
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/block-backend.h"
+#include "sysemu/block-ram-registrar.h"
+#include "qapi/error.h"
+
+static void ram_block_added(RAMBlockNotifier *n, void *host, size_t size,
+size_t max_size)
+{
+BlockRAMRegistrar *r = container_of(n, BlockRAMRegistrar, notifier);
+Error *err = NULL;
+
+if (!blk_register_buf(r->blk, host, max_size, )) {
+error_report_err(err);
+ram_block_notifier_remove(>notifier);
+r->ok = false;
+}
+}
+
+static void ram_block_removed(RAMBlockNotifier *n, void *host, size_t size,
+  size_t max_size)
+{
+BlockRAMRegistrar *r = container_of(n, BlockRAMRegistrar, notifier);
+blk_unregister_buf(r->blk, host, max_size);
+}
+
+void blk_ram_registrar_init(BlockRAMRegistrar *r, BlockBackend *blk)
+{
+r->blk = blk;
+r->notifier = (RAMBlockNotifier){
+.ram_block_added = ram_block_added,
+.ram_block_removed = ram_block_removed,
+
+/*
+ * .ram_block_resized() is not necessary because we use the max_size
+ * value that does not change across resize.
+ */
+};
+r->ok = true;
+
+ram_block_notifier_add(>notifier);
+}
+
+void blk_ram_registrar_destroy(BlockRAMRegistrar *r)
+{
+if (r->ok) {
+ram_block_notifier_remove(>notifier);
+}
+}
diff --git a/block/meson.build b/block/meson.build
index 500878f082..b7c68b83a3 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -46,6 +46,7 @@ block_ss.add(files(
 ), zstd, zlib, gnutls)
 
 softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
+softmmu_ss.add(files('block-ram-registrar.c'))
 
 if get_option('qcow1').allowed()
   block_ss.add(files('qcow.c'))
-- 
2.37.3

[PATCH v5 02/12] blkio: add libblkio block driver

2022-09-27 Thread Stefan Hajnoczi

libblkio (https://gitlab.com/libblkio/libblkio/) is a library for
high-performance disk I/O. It currently supports io_uring,
virtio-blk-vhost-user, and virtio-blk-vhost-vdpa with additional drivers
under development.

One of the reasons for developing libblkio is that other applications
besides QEMU can use it. This will be particularly useful for
virtio-blk-vhost-user which applications may wish to use for connecting
to qemu-storage-daemon.

libblkio also gives us an opportunity to develop in Rust behind a C API
that is easy to consume from QEMU.

This commit adds io_uring, virtio-blk-vhost-user, and
virtio-blk-vhost-vdpa BlockDrivers to QEMU using libblkio. It will be
easy to add other libblkio drivers since they will share the majority of
code.

For now I/O buffers are copied through bounce buffers if the libblkio
driver requires it. Later commits add an optimization for
pre-registering guest RAM to avoid bounce buffers.

The syntax is:

  --blockdev 
io_uring,node-name=drive0,filename=test.img,readonly=on|off,cache.direct=on|off

and:

  --blockdev 
virtio-blk-vhost-vdpa,node-name=drive0,path=/dev/vdpa...,readonly=on|off

Signed-off-by: Stefan Hajnoczi 
Acked-by: Markus Armbruster 
---
 MAINTAINERS   |   6 +
 meson_options.txt |   2 +
 qapi/block-core.json  |  53 ++-
 meson.build   |   9 +
 block/blkio.c | 849 ++
 tests/qtest/modules-test.c|   3 +
 block/meson.build |   1 +
 scripts/meson-buildoptions.sh |   3 +
 8 files changed, 924 insertions(+), 2 deletions(-)
 create mode 100644 block/blkio.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 789172b2a8..878005f65b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3403,6 +3403,12 @@ L: qemu-bl...@nongnu.org
 S: Maintained
 F: block/vdi.c
 
+blkio
+M: Stefan Hajnoczi 
+L: qemu-bl...@nongnu.org
+S: Maintained
+F: block/blkio.c
+
 iSCSI
 M: Ronnie Sahlberg 
 M: Paolo Bonzini 
diff --git a/meson_options.txt b/meson_options.txt
index 79c6af18d5..66128178bf 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -117,6 +117,8 @@ option('bzip2', type : 'feature', value : 'auto',
description: 'bzip2 support for DMG images')
 option('cap_ng', type : 'feature', value : 'auto',
description: 'cap_ng support')
+option('blkio', type : 'feature', value : 'auto',
+   description: 'libblkio block device driver')
 option('bpf', type : 'feature', value : 'auto',
 description: 'eBPF support')
 option('cocoa', type : 'feature', value : 'auto',
diff --git a/qapi/block-core.json b/qapi/block-core.json
index f21fa235f2..5aed0dd436 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2951,11 +2951,16 @@
 'file', 'snapshot-access', 'ftp', 'ftps', 'gluster',
 {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
 {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
-'http', 'https', 'iscsi',
+'http', 'https',
+{ 'name': 'io_uring', 'if': 'CONFIG_BLKIO' },
+'iscsi',
 'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
 'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
 { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
-'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
+'ssh', 'throttle', 'vdi', 'vhdx',
+{ 'name': 'virtio-blk-vhost-user', 'if': 'CONFIG_BLKIO' },
+{ 'name': 'virtio-blk-vhost-vdpa', 'if': 'CONFIG_BLKIO' },
+'vmdk', 'vpc', 'vvfat' ] }
 
 ##
 # @BlockdevOptionsFile:
@@ -3678,6 +3683,42 @@
 '*debug': 'int',
 '*logfile': 'str' } }
 
+##
+# @BlockdevOptionsIoUring:
+#
+# Driver specific block device options for the io_uring backend.
+#
+# @filename: path to the image file
+#
+# Since: 7.2
+##
+{ 'struct': 'BlockdevOptionsIoUring',
+  'data': { 'filename': 'str' } }
+
+##
+# @BlockdevOptionsVirtioBlkVhostUser:
+#
+# Driver specific block device options for the virtio-blk-vhost-user backend.
+#
+# @path: path to the vhost-user UNIX domain socket.
+#
+# Since: 7.2
+##
+{ 'struct': 'BlockdevOptionsVirtioBlkVhostUser',
+  'data': { 'path': 'str' } }
+
+##
+# @BlockdevOptionsVirtioBlkVhostVdpa:
+#
+# Driver specific block device options for the virtio-blk-vhost-vdpa backend.
+#
+# @path: path to the vhost-vdpa character device.
+#
+# Since: 7.2
+##
+{ 'struct': 'BlockdevOptionsVirtioBlkVhostVdpa',
+  'data': { 'path': 'str' } }
+
 ##
 # @IscsiTransport:
 #
@@ -4305,6 +4346,8 @@
'if': 'HAVE_HOST_BLOCK_DEVICE' },
   'http':   'BlockdevOptionsCurlHttp',
   'https':  'BlockdevOptionsCurlHttps',
+  'io_uring':   { 'type': 'BlockdevOptionsIoUring',
+  'if': 'CONFIG_BLKIO' },
   'iscsi':  'BlockdevOptionsIscsi',
   'luks':   'BlockdevOptionsLUKS',
   'nbd':'BlockdevOptionsNbd',
@@ -4327,6

[PATCH v5 06/12] block: add BDRV_REQ_REGISTERED_BUF request flag

2022-09-27 Thread Stefan Hajnoczi

Block drivers may optimize I/O requests accessing buffers previously
registered with bdrv_register_buf(). Checking whether all elements of a
request's QEMUIOVector are within previously registered buffers is
expensive, so we need a hint from the user to avoid costly checks.

Add a BDRV_REQ_REGISTERED_BUF request flag to indicate that all
QEMUIOVector elements in an I/O request are known to be within
previously registered buffers.

Always pass the flag through to driver read/write functions. There is
little harm in passing the flag to a driver that does not use it.
Passing the flag to drivers avoids changes across many block drivers.
Filter drivers would need to explicitly support the flag and pass
through to their children when the children support it. That's a lot of
code changes and it's hard to remember to do that everywhere, leading to
silent reduced performance when the flag is accidentally dropped.

The only problematic scenario with the approach in this patch is when a
driver passes the flag through to internal I/O requests that don't use
the same I/O buffer. In that case the hint may be set when it should
actually be clear. This is a rare case though so the risk is low.

Some drivers have assert(!flags), which no longer works when
BDRV_REQ_REGISTERED_BUF is passed in. These assertions aren't very
useful anyway since the functions are called almost exclusively by
bdrv_driver_preadv/pwritev() so if we get flags handling right there
then the assertion is not needed.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/block-common.h |  9 ++
 block.c  | 14 +
 block/blkverify.c|  4 +--
 block/crypto.c   |  4 +--
 block/file-posix.c   |  1 -
 block/gluster.c  |  1 -
 block/io.c   | 61 ++--
 block/mirror.c   |  2 ++
 block/nbd.c  |  1 -
 block/parallels.c|  1 -
 block/qcow.c |  2 --
 block/qed.c  |  1 -
 block/raw-format.c   |  2 ++
 block/replication.c  |  1 -
 block/ssh.c  |  1 -
 block/vhdx.c |  1 -
 16 files changed, 69 insertions(+), 37 deletions(-)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index fdb7306e78..061606e867 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -80,6 +80,15 @@ typedef enum {
  */
 BDRV_REQ_MAY_UNMAP  = 0x4,
 
+/*
+ * An optimization hint when all QEMUIOVector elements are within
+ * previously registered bdrv_register_buf() memory ranges.
+ *
+ * Code that replaces the user's QEMUIOVector elements with bounce buffers
+ * must take care to clear this flag.
+ */
+BDRV_REQ_REGISTERED_BUF = 0x8,
+
 BDRV_REQ_FUA= 0x10,
 BDRV_REQ_WRITE_COMPRESSED   = 0x20,
 
diff --git a/block.c b/block.c
index bc85f46eed..70abbf774e 100644
--- a/block.c
+++ b/block.c
@@ -1640,6 +1640,20 @@ static int bdrv_open_driver(BlockDriverState *bs, 
BlockDriver *drv,
 goto open_failed;
 }
 
+assert(!(bs->supported_read_flags & ~BDRV_REQ_MASK));
+assert(!(bs->supported_write_flags & ~BDRV_REQ_MASK));
+
+/*
+ * Always allow the BDRV_REQ_REGISTERED_BUF optimization hint. This saves
+ * drivers that pass read/write requests through to a child the trouble of
+ * declaring support explicitly.
+ *
+ * Drivers must not propagate this flag accidentally when they initiate I/O
+ * to a bounce buffer. That case should be rare though.
+ */
+bs->supported_read_flags |= BDRV_REQ_REGISTERED_BUF;
+bs->supported_write_flags |= BDRV_REQ_REGISTERED_BUF;
+
 ret = refresh_total_sectors(bs, bs->total_sectors);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not refresh total sector count");
diff --git a/block/blkverify.c b/block/blkverify.c
index e4a37af3b2..d624f4fd05 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -235,8 +235,8 @@ blkverify_co_preadv(BlockDriverState *bs, int64_t offset, 
int64_t bytes,
 qemu_iovec_init(_qiov, qiov->niov);
 qemu_iovec_clone(_qiov, qiov, buf);
 
-ret = blkverify_co_prwv(bs, , offset, bytes, qiov, _qiov, flags,
-false);
+ret = blkverify_co_prwv(bs, , offset, bytes, qiov, _qiov,
+flags & ~BDRV_REQ_REGISTERED_BUF, false);
 
 cmp_offset = qemu_iovec_compare(qiov, _qiov);
 if (cmp_offset != -1) {
diff --git a/block/crypto.c b/block/crypto.c
index 7a57774b76..c7365598a7 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -410,7 +410,6 @@ block_crypto_co_preadv(BlockDriverState *bs, int64_t 
offset, int64_t bytes,
 uint64_t sector_size = qcrypto_block_get_sector_size(crypto->block);
 uint64_t payload_offset = qcrypto_block_get_payload_offset(crypto->block);
 
-assert(!flags);
 assert(payload_offset < INT64_MAX);

[PATCH v5 03/12] numa: call ->ram_block_removed() in ram_block_notifer_remove()

2022-09-27 Thread Stefan Hajnoczi

When a RAMBlockNotifier is added, ->ram_block_added() is called with all
existing RAMBlocks. There is no equivalent ->ram_block_removed() call
when a RAMBlockNotifier is removed.

The util/vfio-helpers.c code (the sole user of RAMBlockNotifier) is fine
with this asymmetry because it does not rely on RAMBlockNotifier for
cleanup. It walks its internal list of DMA mappings and unmaps them by
itself.

Future users of RAMBlockNotifier may not have an internal data structure
that records added RAMBlocks so they will need ->ram_block_removed()
callbacks.

This patch makes ram_block_notifier_remove() symmetric with respect to
callbacks. Now util/vfio-helpers.c needs to unmap remaining DMA mappings
after ram_block_notifier_remove() has been called. This is necessary
since users like block/nvme.c may create additional DMA mappings that do
not originate from the RAMBlockNotifier.

Reviewed-by: David Hildenbrand 
Signed-off-by: Stefan Hajnoczi 
---
 hw/core/numa.c  | 17 +
 util/vfio-helpers.c |  5 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 26d8e5f616..31e6fe1caa 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -822,6 +822,19 @@ static int ram_block_notify_add_single(RAMBlock *rb, void 
*opaque)
 return 0;
 }
 
+static int ram_block_notify_remove_single(RAMBlock *rb, void *opaque)
+{
+const ram_addr_t max_size = qemu_ram_get_max_length(rb);
+const ram_addr_t size = qemu_ram_get_used_length(rb);
+void *host = qemu_ram_get_host_addr(rb);
+RAMBlockNotifier *notifier = opaque;
+
+if (host) {
+notifier->ram_block_removed(notifier, host, size, max_size);
+}
+return 0;
+}
+
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
 QLIST_INSERT_HEAD(_list.ramblock_notifiers, n, next);
@@ -835,6 +848,10 @@ void ram_block_notifier_add(RAMBlockNotifier *n)
 void ram_block_notifier_remove(RAMBlockNotifier *n)
 {
 QLIST_REMOVE(n, next);
+
+if (n->ram_block_removed) {
+qemu_ram_foreach_block(ram_block_notify_remove_single, n);
+}
 }
 
 void ram_block_notify_add(void *host, size_t size, size_t max_size)
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 5ba01177bf..0d1520caac 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -847,10 +847,13 @@ void qemu_vfio_close(QEMUVFIOState *s)
 if (!s) {
 return;
 }
+
+ram_block_notifier_remove(>ram_notifier);
+
 for (i = 0; i < s->nr_mappings; ++i) {
 qemu_vfio_undo_mapping(s, >mappings[i], NULL);
 }
-ram_block_notifier_remove(>ram_notifier);
+
 g_free(s->usable_iova_ranges);
 s->nb_iova_ranges = 0;
 qemu_vfio_reset(s);
-- 
2.37.3

[PATCH v5 00/12] blkio: add libblkio BlockDriver

2022-09-27 Thread Stefan Hajnoczi

v5:
- Drop "RFC" since libblkio 1.0 has been released and the library API is stable
- Disable BDRV_REQ_REGISTERED_BUF if we run out of blkio_mem_regions. The
  bounce buffer slow path is taken when there are not enough blkio_mem_regions
  to cover guest RAM. [Hanna & David Hildenbrand]
- Call ram_block_discard_disable() when mem-region-pinned property is true or
  absent [David Hildenbrand]
- Use a bounce buffer pool instead of allocating/freeing a buffer for each
  request. This reduces the number of blkio_mem_regions required for bounce
  buffers to 1 and avoids frequent blkio_mem_region_map/unmap() calls.
- Switch to .bdrv_co_*() instead of .bdrv_aio_*(). Needed for the bounce buffer
  pool's CoQueue.
v4:
- Patch 1:
  - Add virtio-blk-vhost-user driver [Kevin]
  - Drop .bdrv_parse_filename() and .bdrv_needs_filename for 
virtio-blk-vhost-vdpa [Stefano]
  - Add copyright and license header [Hanna]
  - Drop .bdrv_parse_filename() in favor of --blockdev or json: [Hanna]
  - Clarify that "filename" is always non-NULL for io_uring [Hanna]
  - Check that virtio-blk-vhost-vdpa "path" option is non-NULL [Hanna]
  - Fix virtio-blk-vhost-vdpa cache.direct=off logic [Hanna]
  - Use macros for driver names [Hanna]
  - Assert that the driver name is valid [Hanna]
  - Update "readonly" property name to "read-only" [Hanna]
  - Call blkio_detach_aio_context() in blkio_close() [Hanna]
  - Avoid uint32_t * to int * casts in blkio_refresh_limits() [Hanna]
  - Remove write zeroes and discard from the todo list [Hanna]
  - Use PRIu32 instead of %d for uint32_t [Hanna]
  - Fix error messages with buf-alignment instead of optimal-io-size [Hanna]
  - Call map/unmap APIs since libblkio alloc/free APIs no longer do that
  - Update QAPI schema QEMU version to 7.2
- Patch 5:
  - Expand the BDRV_REQ_REGISTERED_BUF flag passthrough and drop assert(!flags)
in drivers [Hanna]
- Patch 7:
  - Fix BLK->BDRV typo [Hanna]
  - Make BlockRAMRegistrar handle failure [Hanna]
- Patch 8:
  - Replace memory_region_get_fd() approach with qemu_ram_get_fd()
- Patch 10:
  - Use (void)ret; to discard unused return value [Hanna]
  - libblkio's blkio_unmap_mem_region() API no longer has a return value
  - Check for registered bufs that cross RAMBlocks [Hanna]
- Patch 11:
  - Handle bdrv_register_buf() errors [Hanna]
v3:
- Add virtio-blk-vhost-vdpa for vdpa-blk devices including VDUSE
- Add discard and write zeroes support
- Rebase and adopt latest libblkio APIs
v2:
- Add BDRV_REQ_REGISTERED_BUF to bs.supported_write_flags [Stefano]
- Use new blkioq_get_num_completions() API
- Implement .bdrv_refresh_limits()

This patch series adds a QEMU BlockDriver for libblkio
(https://gitlab.com/libblkio/libblkio/), a library for high-performance block
device I/O. This work was presented at KVM Forum 2022 and slides are available
here:
https://static.sched.com/hosted_files/kvmforum2022/8c/libblkio-kvm-forum-2022.pdf

The second patch adds the core BlockDriver and most of the libblkio API usage.
Three libblkio drivers are included:
- io_uring
- virtio-blk-vhost-user
- virtio-blk-vhost-vdpa

The remainder of the patch series reworks the existing QEMU bdrv_register_buf()
API so virtio-blk emulation efficiently map guest RAM for libblkio - some
libblkio drivers require that I/O buffer memory is pre-registered (think VFIO,
vhost, etc).

Vladimir requested performance results that show the effect of the
BDRV_REQ_REGISTERED_BUF flag. I ran the patches against qemu-storage-daemon's
vhost-user-blk export with iodepth=1 bs=512 to see the per-request overhead due
to bounce buffer allocation/mapping:

Name   IOPS   Error
bounce-buf  4373.81 ± 0.01%
registered-buf 13062.80 ± 0.67%

The BDRV_REQ_REGISTERED_BUF optimization version is about 3x faster.

See the BlockDriver struct in block/blkio.c for a list of APIs that still need
to be implemented. The core functionality is covered.

Regarding the design: each libblkio driver is a separately named BlockDriver.
That means there is an "io_uring" BlockDriver and not a generic "libblkio"
BlockDriver. This way QAPI and open parameters are type-safe and mandatory
parameters can be checked by QEMU.

Stefan Hajnoczi (12):
  coroutine: add flag to re-queue at front of CoQueue
  blkio: add libblkio block driver
  numa: call ->ram_block_removed() in ram_block_notifer_remove()
  block: pass size to bdrv_unregister_buf()
  block: use BdrvRequestFlags type for supported flag fields
  block: add BDRV_REQ_REGISTERED_BUF request flag
  block: return errors from bdrv_register_buf()
  block: add BlockRAMRegistrar
  exec/cpu-common: add qemu_ram_get_fd()
  stubs: add qemu_ram_block_from_host() and qemu_ram_get_fd()
  blkio: implement BDRV_REQ_REGISTERED_BUF optimization
  virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint

 MAINTAINERS |7 +
 meson_options.txt   |2 +
 qapi/block-core.json

[PATCH v5 01/12] coroutine: add flag to re-queue at front of CoQueue

2022-09-27 Thread Stefan Hajnoczi

When a coroutine wakes up it may determine that it must re-queue.
Normally coroutines are pushed onto the back of the CoQueue, but for
fairness it may be necessary to push it onto the front of the CoQueue.

Add a flag to specify that the coroutine should be pushed onto the front
of the CoQueue. A later patch will use this to ensure fairness in the
bounce buffer CoQueue used by the blkio BlockDriver.

Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/coroutine.h   | 15 +--
 util/qemu-coroutine-lock.c |  9 +++--
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 08c5bb3c76..3f418a67f6 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -198,14 +198,25 @@ typedef struct CoQueue {
  */
 void qemu_co_queue_init(CoQueue *queue);
 
+typedef enum {
+/*
+ * Enqueue at front instead of back. Use this to re-queue a request when
+ * its wait condition is not satisfied after being woken up.
+ */
+CO_QUEUE_WAIT_FRONT = 0x1,
+} CoQueueWaitFlags;
+
 /**
  * Adds the current coroutine to the CoQueue and transfers control to the
  * caller of the coroutine.  The mutex is unlocked during the wait and
  * locked again afterwards.
  */
 #define qemu_co_queue_wait(queue, lock) \
-qemu_co_queue_wait_impl(queue, QEMU_MAKE_LOCKABLE(lock))
-void coroutine_fn qemu_co_queue_wait_impl(CoQueue *queue, QemuLockable *lock);
+qemu_co_queue_wait_impl(queue, QEMU_MAKE_LOCKABLE(lock), 0)
+#define qemu_co_queue_wait_flags(queue, lock, flags) \
+qemu_co_queue_wait_impl(queue, QEMU_MAKE_LOCKABLE(lock), (flags))
+void coroutine_fn qemu_co_queue_wait_impl(CoQueue *queue, QemuLockable *lock,
+  CoQueueWaitFlags flags);
 
 /**
  * Removes the next coroutine from the CoQueue, and queue it to run after
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 9ad24ab1af..0516bd2ff3 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -39,10 +39,15 @@ void qemu_co_queue_init(CoQueue *queue)
 QSIMPLEQ_INIT(>entries);
 }
 
-void coroutine_fn qemu_co_queue_wait_impl(CoQueue *queue, QemuLockable *lock)
+void coroutine_fn qemu_co_queue_wait_impl(CoQueue *queue, QemuLockable *lock,
+  CoQueueWaitFlags flags)
 {
 Coroutine *self = qemu_coroutine_self();
-QSIMPLEQ_INSERT_TAIL(>entries, self, co_queue_next);
+if (flags & CO_QUEUE_WAIT_FRONT) {
+QSIMPLEQ_INSERT_HEAD(>entries, self, co_queue_next);
+} else {
+QSIMPLEQ_INSERT_TAIL(>entries, self, co_queue_next);
+}
 
 if (lock) {
 qemu_lockable_unlock(lock);
-- 
2.37.3

Re: [PATCH v5] 9pfs: use GHashTable for fid table

2022-09-27 Thread Christian Schoenebeck

On Dienstag, 27. September 2022 17:14:57 CEST Linus Heckemann wrote:
> Linus Heckemann  writes:
> >  static void coroutine_fn virtfs_reset(V9fsPDU *pdu)
> >  {
> >  
> >  V9fsState *s = pdu->s;
> >  V9fsFidState *fidp;
> > 
> > +GList *freeing;
> > +/* Get a list of all the values (fid states) in the table, which we
> > then... */ +g_autoptr(GList) fids = g_hash_table_get_values(s->fids);
> > 
> > -/* Free all fids */
> > -while (!QSIMPLEQ_EMPTY(>fid_list)) {
> > -/* Get fid */
> > -fidp = QSIMPLEQ_FIRST(>fid_list);
> > -fidp->ref++;
> > +/* ... remove from the table, taking over ownership. */
> > +g_hash_table_steal_all(s->fids);
> > 
> > -/* Clunk fid */
> > -QSIMPLEQ_REMOVE(>fid_list, fidp, V9fsFidState, next);
> > +/*
> > + * This allows us to release our references to them asynchronously
> > without + * iterating over the hash table and risking iterator
> > invalidation + * through concurrent modifications.
> > + */
> > +for (freeing = fids; freeing; freeing = freeing->next) {
> > +fidp = freeing->data;
> > +fidp->ref++;
> > 
> >  fidp->clunked = true;
> > 
> > -
> > 
> >  put_fid(pdu, fidp);
> >  
> >  }
> >  
> >  }
> 
> I'm not sure if this implementation is correct. I'm concerned that it
> may result in dangling references, but haven't been able to find a
> client that will send the TVERSION request on a connection that's
> already been used in other ways, as opposed to when the connection is
> first established. I suspect this will be very rare in general, so it
> might be good to have a test case somewhere.

Always welcome! :)
https://wiki.qemu.org/Documentation/9p#Test_Cases

If you do, then please add the test as a separate patch.

Best regards,
Christian Schoenebeck

Re: Should we maybe move Cirrus-CI jobs away from Gitlab again?

2022-09-27 Thread Stefan Hajnoczi

On Tue, 27 Sept 2022 at 15:04, Thomas Huth  wrote:
>
> On 27/09/2022 20.47, Stefan Hajnoczi wrote:
> > On Tue, 27 Sept 2022 at 14:40, Thomas Huth  wrote:
> >>
> >> On 27/09/2022 19.57, Daniel P. Berrangé wrote:
> >>> On Tue, Sep 27, 2022 at 01:36:20PM -0400, Stefan Hajnoczi wrote:
>  On Tue, 27 Sept 2022 at 11:54, Daniel P. Berrangé  
>  wrote:
> >
> > On Tue, Sep 27, 2022 at 11:44:45AM -0400, Stefan Hajnoczi wrote:
> >> On Tue, 27 Sept 2022 at 05:02, Thomas Huth  wrote:
> >>> now that Gitlab is giving us pressure on the amount of free CI 
> >>> minutes, I
> >>> wonder whether we should maybe move the Cirrus-CI jobs out of the 
> >>> gitlab-CI
> >>> dashboard again? We could add the jobs to our .cirrus-ci.yml file 
> >>> instead,
> >>> like we did it in former times...
> >>>
> >>> Big advantage would be of course that the time for those jobs would 
> >>> not
> >>> count in the Gitlab-CI minutes anymore. Disadvantage is of course 
> >>> that they
> >>> do not show up in the gitlab-CI dashboard anymore, so there is no more
> >>> e-mail notification about failed jobs, and you have to push to 
> >>> github, too,
> >>> and finally check the results manually on cirrus-ci.com ...
> >>
> >> My understanding is that .gitlab-ci.d/cirrus.yml uses a GitLab CI job
> >> to run the cirrus-run container image that forwards jobs to Cirrus-CI.
> >> So GitLab CI resources are consumed waiting for Cirrus-CI to finish.
> >>
> >> This shouldn't affect gitlab.com/qemu-project where there are private
> >> runners that do not consume GitLab CI minutes.
> >>
> >> Individual developers are affected though because they most likely
> >> rely on the GitLab shared runner minutes quota.
> >
> > NB, none of the jobs should ever be run automatically anymore in
> > QEMU CI pipelines. It always requires the maintainer to set the
> > env var when pushing to git, to explicitly create a pipeline.
> > You can then selectively start each individual job as desired.
> 
>  Cirrus CI is not automatically started when pushing to a personal
>  GitLab repo? If starting it requires manual action anyway then I think
>  nothing needs to be changed here.
> >>>
> >>> No pipeline at all is created unless you do
> >>>
> >>> git push -o ci.variable=QEMU_CI=1 
> >>>
> >>> that creates the pipeliune but doesn't run any jobs - they're manual
> >>> start.
> >>
> >> Yes, sure, the jobs are not started automatically. But I *do* want to run
> >> the jobs before sending pull requests - but since the gitlab-CI minutes are
> >> now very limited, I'd like to avoid burning these minutes via gitlab and
> >> start those jobs directly on cirrus-ci.com again. For that the jobs would
> >> need to be moved to our .cirrus-ci.yml file again.
> >>
> >> Well, maybe we could also have both, jobs via cirrus-run for those who want
> >> to see them in their gitlab-CI dashboard, and via .cirrus-ci.yml for those
> >> who want to avoid burning CI minutes on Gitlab. It's a little bit of
> >> double-maintenance, but maybe acceptable?
> >
> > I just noticed that qemu.git/master doesn't run Cirrus-CI. I guess it
> > hasn't been set up in our GitLab project.
> >
> > Since it's not enabled for qemu.git/master nothing will change from my
> > perspective. Feel free to change it as you wish.
>
> It's only run for the "staging" branch, I think. The idea was that things
> get tested before merge on the "staging" branch, then there is no need
> anymore to rerun everything when it gets pushed into the "master" branch.

I don't see a cirrus job:
https://gitlab.com/qemu-project/qemu/-/pipelines/652051335

Stefan

Re: [PATCH v5] 9pfs: use GHashTable for fid table

2022-09-27 Thread Christian Schoenebeck

On Dienstag, 27. September 2022 16:25:03 CEST Linus Heckemann wrote:
> The previous implementation would iterate over the fid table for
> lookup operations, resulting in an operation with O(n) complexity on
> the number of open files and poor cache locality -- for every open,
> stat, read, write, etc operation.
> 
> This change uses a hashtable for this instead, significantly improving
> the performance of the 9p filesystem. The runtime of NixOS's simple
> installer test, which copies ~122k files totalling ~1.8GiB from 9p,
> decreased by a factor of about 10.
> 
> Signed-off-by: Linus Heckemann 
> Reviewed-by: Philippe Mathieu-Daudé 
> Reviewed-by: Greg Kurz 
> Message-Id: <20220908112353.289267-1-...@sphalerite.org>
> [CS: - Retain BUG_ON(f->clunked) in get_fid().
>  - Add TODO comment in clunk_fid(). ]
> Signed-off-by: Christian Schoenebeck 
> ---
> This squashes the separately submitted patch "9pfs: avoid iterator
> invalidation in v9fs_mark_fids_unreclaim"
> (20220926124207.1325763-1-...@sphalerite.org) into the previous
> version of this change.
> 
> I've skipped v4, because the former is arguably a poorly submitted v4.
> 
> I've also addressed Christian Schoenebeck's comments on the former:
> 
> * (v9fs_mark_fids_unreclaim) switched to g_autoptr for the array
>   storing the fids intermediately in preparation for reopening
> 
> * (v9fs_mark_fids_unreclaim) restored the accidentally removed
>   FID_NON_RECLAIMABLE mark
> 
> * (v9fs_mark_fids_unreclaim) moved fid reference release into a third
>   loop, which is run even if an error is encountered during a reopen
>   operation, in order to avoid leaking references to fids.
> 
> * (v9fs_reset) implemented logic to avoid the same iterator
>   invalidation problem
> 
> I've also added a comment explaining the exact role of
> v9fs_mark_fids_unreclaim, since it's not entirely obvious at a glance.
> 
> 
>  hw/9pfs/9p.c | 188 ---
>  hw/9pfs/9p.h |   2 +-
>  2 files changed, 106 insertions(+), 84 deletions(-)
> 
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index aebadeaa03..0e485cb631 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -282,33 +282,32 @@ static V9fsFidState *coroutine_fn get_fid(V9fsPDU
> *pdu, int32_t fid) V9fsFidState *f;
>  V9fsState *s = pdu->s;
> 
> -QSIMPLEQ_FOREACH(f, >fid_list, next) {
> +f = g_hash_table_lookup(s->fids, GINT_TO_POINTER(fid));
> +if (f) {
>  BUG_ON(f->clunked);
> -if (f->fid == fid) {
> -/*
> - * Update the fid ref upfront so that
> - * we don't get reclaimed when we yield
> - * in open later.
> - */
> -f->ref++;
> -/*
> - * check whether we need to reopen the
> - * file. We might have closed the fd
> - * while trying to free up some file
> - * descriptors.
> - */
> -err = v9fs_reopen_fid(pdu, f);
> -if (err < 0) {
> -f->ref--;
> -return NULL;
> -}
> -/*
> - * Mark the fid as referenced so that the LRU
> - * reclaim won't close the file descriptor
> - */
> -f->flags |= FID_REFERENCED;
> -return f;
> +/*
> + * Update the fid ref upfront so that
> + * we don't get reclaimed when we yield
> + * in open later.
> + */
> +f->ref++;
> +/*
> + * check whether we need to reopen the
> + * file. We might have closed the fd
> + * while trying to free up some file
> + * descriptors.
> + */
> +err = v9fs_reopen_fid(pdu, f);
> +if (err < 0) {
> +f->ref--;
> +return NULL;
>  }
> +/*
> + * Mark the fid as referenced so that the LRU
> + * reclaim won't close the file descriptor
> + */
> +f->flags |= FID_REFERENCED;
> +return f;
>  }
>  return NULL;
>  }
> @@ -317,12 +316,9 @@ static V9fsFidState *alloc_fid(V9fsState *s, int32_t
> fid) {
>  V9fsFidState *f;
> 
> -QSIMPLEQ_FOREACH(f, >fid_list, next) {
> +if (g_hash_table_contains(s->fids, GINT_TO_POINTER(fid))) {
>  /* If fid is already there return NULL */
> -BUG_ON(f->clunked);

Not a big deal, but I start thinking whether to keep BUG_ON() here as well. 
That would require using g_hash_table_lookup() here instead of 
g_hash_table_contains(). Not that I would insist.

> -if (f->fid == fid) {
> -return NULL;
> -}
> +return NULL;
>  }
>  f = g_new0(V9fsFidState, 1);
>  f->fid = fid;
> @@ -333,7 +329,7 @@ static V9fsFidState *alloc_fid(V9fsState *s, int32_t
> fid) * reclaim won't close the file descriptor
>   */
>  f->flags |= FID_REFERENCED;
> -QSIMPLEQ_INSERT_TAIL(>fid_list, f, next);
> +g_hash_table_insert(s->fids, GINT_TO_POINTER(fid),

Re: Should we maybe move Cirrus-CI jobs away from Gitlab again?

2022-09-27 Thread Thomas Huth


On 27/09/2022 20.47, Stefan Hajnoczi wrote:

On Tue, 27 Sept 2022 at 14:40, Thomas Huth  wrote:


On 27/09/2022 19.57, Daniel P. Berrangé wrote:

On Tue, Sep 27, 2022 at 01:36:20PM -0400, Stefan Hajnoczi wrote:

On Tue, 27 Sept 2022 at 11:54, Daniel P. Berrangé  wrote:


On Tue, Sep 27, 2022 at 11:44:45AM -0400, Stefan Hajnoczi wrote:

On Tue, 27 Sept 2022 at 05:02, Thomas Huth  wrote:

now that Gitlab is giving us pressure on the amount of free CI minutes, I
wonder whether we should maybe move the Cirrus-CI jobs out of the gitlab-CI
dashboard again? We could add the jobs to our .cirrus-ci.yml file instead,
like we did it in former times...

Big advantage would be of course that the time for those jobs would not
count in the Gitlab-CI minutes anymore. Disadvantage is of course that they
do not show up in the gitlab-CI dashboard anymore, so there is no more
e-mail notification about failed jobs, and you have to push to github, too,
and finally check the results manually on cirrus-ci.com ...


My understanding is that .gitlab-ci.d/cirrus.yml uses a GitLab CI job
to run the cirrus-run container image that forwards jobs to Cirrus-CI.
So GitLab CI resources are consumed waiting for Cirrus-CI to finish.

This shouldn't affect gitlab.com/qemu-project where there are private
runners that do not consume GitLab CI minutes.

Individual developers are affected though because they most likely
rely on the GitLab shared runner minutes quota.


NB, none of the jobs should ever be run automatically anymore in
QEMU CI pipelines. It always requires the maintainer to set the
env var when pushing to git, to explicitly create a pipeline.
You can then selectively start each individual job as desired.


Cirrus CI is not automatically started when pushing to a personal
GitLab repo? If starting it requires manual action anyway then I think
nothing needs to be changed here.


No pipeline at all is created unless you do

git push -o ci.variable=QEMU_CI=1 

that creates the pipeliune but doesn't run any jobs - they're manual
start.


Yes, sure, the jobs are not started automatically. But I *do* want to run
the jobs before sending pull requests - but since the gitlab-CI minutes are
now very limited, I'd like to avoid burning these minutes via gitlab and
start those jobs directly on cirrus-ci.com again. For that the jobs would
need to be moved to our .cirrus-ci.yml file again.

Well, maybe we could also have both, jobs via cirrus-run for those who want
to see them in their gitlab-CI dashboard, and via .cirrus-ci.yml for those
who want to avoid burning CI minutes on Gitlab. It's a little bit of
double-maintenance, but maybe acceptable?


I just noticed that qemu.git/master doesn't run Cirrus-CI. I guess it
hasn't been set up in our GitLab project.

Since it's not enabled for qemu.git/master nothing will change from my
perspective. Feel free to change it as you wish.


It's only run for the "staging" branch, I think. The idea was that things 
get tested before merge on the "staging" branch, then there is no need 
anymore to rerun everything when it gets pushed into the "master" branch.


 Thomas

Re: Should we maybe move Cirrus-CI jobs away from Gitlab again?

2022-09-27 Thread Stefan Hajnoczi

On Tue, 27 Sept 2022 at 14:40, Thomas Huth  wrote:
>
> On 27/09/2022 19.57, Daniel P. Berrangé wrote:
> > On Tue, Sep 27, 2022 at 01:36:20PM -0400, Stefan Hajnoczi wrote:
> >> On Tue, 27 Sept 2022 at 11:54, Daniel P. Berrangé  
> >> wrote:
> >>>
> >>> On Tue, Sep 27, 2022 at 11:44:45AM -0400, Stefan Hajnoczi wrote:
>  On Tue, 27 Sept 2022 at 05:02, Thomas Huth  wrote:
> > now that Gitlab is giving us pressure on the amount of free CI minutes, 
> > I
> > wonder whether we should maybe move the Cirrus-CI jobs out of the 
> > gitlab-CI
> > dashboard again? We could add the jobs to our .cirrus-ci.yml file 
> > instead,
> > like we did it in former times...
> >
> > Big advantage would be of course that the time for those jobs would not
> > count in the Gitlab-CI minutes anymore. Disadvantage is of course that 
> > they
> > do not show up in the gitlab-CI dashboard anymore, so there is no more
> > e-mail notification about failed jobs, and you have to push to github, 
> > too,
> > and finally check the results manually on cirrus-ci.com ...
> 
>  My understanding is that .gitlab-ci.d/cirrus.yml uses a GitLab CI job
>  to run the cirrus-run container image that forwards jobs to Cirrus-CI.
>  So GitLab CI resources are consumed waiting for Cirrus-CI to finish.
> 
>  This shouldn't affect gitlab.com/qemu-project where there are private
>  runners that do not consume GitLab CI minutes.
> 
>  Individual developers are affected though because they most likely
>  rely on the GitLab shared runner minutes quota.
> >>>
> >>> NB, none of the jobs should ever be run automatically anymore in
> >>> QEMU CI pipelines. It always requires the maintainer to set the
> >>> env var when pushing to git, to explicitly create a pipeline.
> >>> You can then selectively start each individual job as desired.
> >>
> >> Cirrus CI is not automatically started when pushing to a personal
> >> GitLab repo? If starting it requires manual action anyway then I think
> >> nothing needs to be changed here.
> >
> > No pipeline at all is created unless you do
> >
> >git push -o ci.variable=QEMU_CI=1 
> >
> > that creates the pipeliune but doesn't run any jobs - they're manual
> > start.
>
> Yes, sure, the jobs are not started automatically. But I *do* want to run
> the jobs before sending pull requests - but since the gitlab-CI minutes are
> now very limited, I'd like to avoid burning these minutes via gitlab and
> start those jobs directly on cirrus-ci.com again. For that the jobs would
> need to be moved to our .cirrus-ci.yml file again.
>
> Well, maybe we could also have both, jobs via cirrus-run for those who want
> to see them in their gitlab-CI dashboard, and via .cirrus-ci.yml for those
> who want to avoid burning CI minutes on Gitlab. It's a little bit of
> double-maintenance, but maybe acceptable?

I just noticed that qemu.git/master doesn't run Cirrus-CI. I guess it
hasn't been set up in our GitLab project.

Since it's not enabled for qemu.git/master nothing will change from my
perspective. Feel free to change it as you wish.

Stefan

Re: Should we maybe move Cirrus-CI jobs away from Gitlab again?

2022-09-27 Thread Thomas Huth


On 27/09/2022 19.57, Daniel P. Berrangé wrote:

On Tue, Sep 27, 2022 at 01:36:20PM -0400, Stefan Hajnoczi wrote:

On Tue, 27 Sept 2022 at 11:54, Daniel P. Berrangé  wrote:


On Tue, Sep 27, 2022 at 11:44:45AM -0400, Stefan Hajnoczi wrote:

On Tue, 27 Sept 2022 at 05:02, Thomas Huth  wrote:

now that Gitlab is giving us pressure on the amount of free CI minutes, I
wonder whether we should maybe move the Cirrus-CI jobs out of the gitlab-CI
dashboard again? We could add the jobs to our .cirrus-ci.yml file instead,
like we did it in former times...

Big advantage would be of course that the time for those jobs would not
count in the Gitlab-CI minutes anymore. Disadvantage is of course that they
do not show up in the gitlab-CI dashboard anymore, so there is no more
e-mail notification about failed jobs, and you have to push to github, too,
and finally check the results manually on cirrus-ci.com ...


My understanding is that .gitlab-ci.d/cirrus.yml uses a GitLab CI job
to run the cirrus-run container image that forwards jobs to Cirrus-CI.
So GitLab CI resources are consumed waiting for Cirrus-CI to finish.

This shouldn't affect gitlab.com/qemu-project where there are private
runners that do not consume GitLab CI minutes.

Individual developers are affected though because they most likely
rely on the GitLab shared runner minutes quota.


NB, none of the jobs should ever be run automatically anymore in
QEMU CI pipelines. It always requires the maintainer to set the
env var when pushing to git, to explicitly create a pipeline.
You can then selectively start each individual job as desired.


Cirrus CI is not automatically started when pushing to a personal
GitLab repo? If starting it requires manual action anyway then I think
nothing needs to be changed here.


No pipeline at all is created unless you do

   git push -o ci.variable=QEMU_CI=1 

that creates the pipeliune but doesn't run any jobs - they're manual
start.


Yes, sure, the jobs are not started automatically. But I *do* want to run 
the jobs before sending pull requests - but since the gitlab-CI minutes are 
now very limited, I'd like to avoid burning these minutes via gitlab and 
start those jobs directly on cirrus-ci.com again. For that the jobs would 
need to be moved to our .cirrus-ci.yml file again.


Well, maybe we could also have both, jobs via cirrus-run for those who want 
to see them in their gitlab-CI dashboard, and via .cirrus-ci.yml for those 
who want to avoid burning CI minutes on Gitlab. It's a little bit of 
double-maintenance, but maybe acceptable?


 Thomas

Re: [PULL 00/14] s390x patches and slirp submodule removal

2022-09-27 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 0/8] Net patches

2022-09-27 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 0/3] M68k for 7.2 patches

2022-09-27 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PATCH v11 06/15] target/hexagon: expose next PC in DisasContext

2022-09-27 Thread Anton Johansson via


Missing the rationale. "The idef-parser will use it with IMM_NPC".

But I feel I'm missing something, what is the diff between 
IMM_PC/IMM_NPC?



I'll try to clarify.

Firstly, why do we need this patch? Hexagon intructions need access to
both the current pc and the next pc, for the generated helper functions
this is done through env->gpr[HEX_REG_PC] and `env->next_PC`. However
for the TCG code generated by idef-parser we much prefer reading pc and
npc from `DisasContext` as these are constant during translation time,
we can then "constant fold" by emitting C code for operations on pc and
npc instead of TCG code.

Secondly, what's IMM_NPC and IMM_PC? This is internal to the parser, but
refers to types of immediate values `HexImm`, an immediate with type
IMM_NPC can then easily be emitted as `ctx->npc`, similarly IMM_PC gets
emitted as `ctx->base.pc_next`.


And why not use target_ulong?


Good idea, I'll change it to `target_ulong`

--
Anton Johansson,
rev.ng Labs Srl.

Re: [PATCH v11 03/15] target/hexagon: make slot number an unsigned

2022-09-27 Thread Anton Johansson via


Using 'unsigned' would keep consistency with the rest of the codebase.

Otherwise,
Reviewed-by: Philippe Mathieu-Daudé 


Good catch, I'll make sure to change the various `uint32_t` -> 
`unsigned`! :)


--
Anton Johansson,
rev.ng Labs Srl.

Re: Should we maybe move Cirrus-CI jobs away from Gitlab again?

2022-09-27 Thread Daniel P . Berrangé

On Tue, Sep 27, 2022 at 01:36:20PM -0400, Stefan Hajnoczi wrote:
> On Tue, 27 Sept 2022 at 11:54, Daniel P. Berrangé  wrote:
> >
> > On Tue, Sep 27, 2022 at 11:44:45AM -0400, Stefan Hajnoczi wrote:
> > > On Tue, 27 Sept 2022 at 05:02, Thomas Huth  wrote:
> > > > now that Gitlab is giving us pressure on the amount of free CI minutes, 
> > > > I
> > > > wonder whether we should maybe move the Cirrus-CI jobs out of the 
> > > > gitlab-CI
> > > > dashboard again? We could add the jobs to our .cirrus-ci.yml file 
> > > > instead,
> > > > like we did it in former times...
> > > >
> > > > Big advantage would be of course that the time for those jobs would not
> > > > count in the Gitlab-CI minutes anymore. Disadvantage is of course that 
> > > > they
> > > > do not show up in the gitlab-CI dashboard anymore, so there is no more
> > > > e-mail notification about failed jobs, and you have to push to github, 
> > > > too,
> > > > and finally check the results manually on cirrus-ci.com ...
> > >
> > > My understanding is that .gitlab-ci.d/cirrus.yml uses a GitLab CI job
> > > to run the cirrus-run container image that forwards jobs to Cirrus-CI.
> > > So GitLab CI resources are consumed waiting for Cirrus-CI to finish.
> > >
> > > This shouldn't affect gitlab.com/qemu-project where there are private
> > > runners that do not consume GitLab CI minutes.
> > >
> > > Individual developers are affected though because they most likely
> > > rely on the GitLab shared runner minutes quota.
> >
> > NB, none of the jobs should ever be run automatically anymore in
> > QEMU CI pipelines. It always requires the maintainer to set the
> > env var when pushing to git, to explicitly create a pipeline.
> > You can then selectively start each individual job as desired.
> 
> Cirrus CI is not automatically started when pushing to a personal
> GitLab repo? If starting it requires manual action anyway then I think
> nothing needs to be changed here.

No pipeline at all is created unless you do

  git push -o ci.variable=QEMU_CI=1 

that creates the pipeliune but doesn't run any jobs - they're manual
start. Or QEMU_CI=2 creates & starts the jobs (like the old way we
had CI until a few months ago, which burns CI quota hugely).


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 1/8] hw/avr: Add limited support for avr gpio registers

2022-09-27 Thread Michael Rolnik

Hi all,

Is there any kind of web UI where I can review it?
I don't find this patch in https://patchew.org/  (there is only 2 year old
version (https://patchew.org/search?q=project%3AQEMU+%22hw%2Favr%22))

Thank you,
Michael Rolnik

On Mon, Sep 12, 2022 at 2:21 PM Heecheol Yang 
wrote:

> Add some of these features for AVR GPIO:
>
>   - GPIO I/O : PORTx registers
>   - Data Direction : DDRx registers
>   - DDRx toggling : PINx registers
>
> Following things are not supported yet:
>   - MCUR registers
>
> Signed-off-by: Heecheol Yang 
> Reviewed-by: Michael Rolnik 
> Message-Id: <
> dm6pr16mb247368dbd3447abecdd795d7e6...@dm6pr16mb2473.namprd16.prod.outlook.com
> >
> [PMD: Use AVR_GPIO_COUNT]
> Signed-off-by: Philippe Mathieu-Daudé 
> Message-Id: <20210313165445.2113938-4-f4...@amsat.org>
> ---
>  hw/avr/Kconfig |   1 +
>  hw/avr/atmega.c|   7 +-
>  hw/avr/atmega.h|   2 +
>  hw/gpio/Kconfig|   3 +
>  hw/gpio/avr_gpio.c | 138 +
>  hw/gpio/meson.build|   1 +
>  include/hw/gpio/avr_gpio.h |  53 ++
>  7 files changed, 203 insertions(+), 2 deletions(-)
>  create mode 100644 hw/gpio/avr_gpio.c
>  create mode 100644 include/hw/gpio/avr_gpio.h
>
> diff --git a/hw/avr/Kconfig b/hw/avr/Kconfig
> index d31298c3cc..16a57ced11 100644
> --- a/hw/avr/Kconfig
> +++ b/hw/avr/Kconfig
> @@ -3,6 +3,7 @@ config AVR_ATMEGA_MCU
>  select AVR_TIMER16
>  select AVR_USART
>  select AVR_POWER
> +select AVR_GPIO
>
>  config ARDUINO
>  select AVR_ATMEGA_MCU
> diff --git a/hw/avr/atmega.c b/hw/avr/atmega.c
> index a34803e642..f5fb3a5225 100644
> --- a/hw/avr/atmega.c
> +++ b/hw/avr/atmega.c
> @@ -282,8 +282,11 @@ static void atmega_realize(DeviceState *dev, Error
> **errp)
>  continue;
>  }
>  devname = g_strdup_printf("atmega-gpio-%c", 'a' + (char)i);
> -create_unimplemented_device(devname,
> -OFFSET_DATA + mc->dev[idx].addr, 3);
> +object_initialize_child(OBJECT(dev), devname, >gpio[i],
> +TYPE_AVR_GPIO);
> +sysbus_realize(SYS_BUS_DEVICE(>gpio[i]), _abort);
> +sysbus_mmio_map(SYS_BUS_DEVICE(>gpio[i]), 0,
> +OFFSET_DATA + mc->dev[idx].addr);
>  g_free(devname);
>  }
>
> diff --git a/hw/avr/atmega.h b/hw/avr/atmega.h
> index a99ee15c7e..e2289d5744 100644
> --- a/hw/avr/atmega.h
> +++ b/hw/avr/atmega.h
> @@ -13,6 +13,7 @@
>
>  #include "hw/char/avr_usart.h"
>  #include "hw/timer/avr_timer16.h"
> +#include "hw/gpio/avr_gpio.h"
>  #include "hw/misc/avr_power.h"
>  #include "target/avr/cpu.h"
>  #include "qom/object.h"
> @@ -44,6 +45,7 @@ struct AtmegaMcuState {
>  DeviceState *io;
>  AVRMaskState pwr[POWER_MAX];
>  AVRUsartState usart[USART_MAX];
> +AVRGPIOState gpio[GPIO_MAX];
>  AVRTimer16State timer[TIMER_MAX];
>  uint64_t xtal_freq_hz;
>  };
> diff --git a/hw/gpio/Kconfig b/hw/gpio/Kconfig
> index f0e7405f6e..fde7019b2b 100644
> --- a/hw/gpio/Kconfig
> +++ b/hw/gpio/Kconfig
> @@ -13,3 +13,6 @@ config GPIO_PWR
>
>  config SIFIVE_GPIO
>  bool
> +
> +config AVR_GPIO
> +bool
> diff --git a/hw/gpio/avr_gpio.c b/hw/gpio/avr_gpio.c
> new file mode 100644
> index 00..cdb574ef0d
> --- /dev/null
> +++ b/hw/gpio/avr_gpio.c
> @@ -0,0 +1,138 @@
> +/*
> + * AVR processors GPIO registers emulation.
> + *
> + * Copyright (C) 2020 Heecheol Yang 
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 or
> + * (at your option) version 3 of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see .
> + */
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "qemu/module.h"
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "hw/sysbus.h"
> +#include "hw/irq.h"
> +#include "hw/gpio/avr_gpio.h"
> +#include "hw/qdev-properties.h"
> +
> +static void avr_gpio_reset(DeviceState *dev)
> +{
> +AVRGPIOState *gpio = AVR_GPIO(dev);
> +
> +gpio->reg.pin = 0u;
> +gpio->reg.ddr = 0u;
> +gpio->reg.port = 0u;
> +}
> +
> +static void avr_gpio_write_port(AVRGPIOState *s, uint64_t value)
> +{
> +uint8_t pin;
> +uint8_t cur_port_val = s->reg.port;
> +uint8_t cur_ddr_val = s->reg.ddr;
> +
> +for (pin = 0u; pin < AVR_GPIO_COUNT ; pin++) {
> +uint8_t cur_port_pin_val = cur_port_val & 0x01u;
> +uint8_t cur_ddr_pin_val = cur_ddr_val & 0x01u;
> +uint8_t

Re: virtiofsd: Any reason why there's not an "openat2" sandbox mode?

2022-09-27 Thread Colin Walters

On Tue, Sep 27, 2022, at 1:27 PM, German Maglione wrote:
>
>> > Now all the development has moved to rust virtiofsd.

Oh, awesome!!  The code there looks great.

> I could work on this for the next major version and see if anything breaks.
> But I prefer to add this as a compilation feature, instead of a command line
> option that we will then have to maintain for a while.

Hmm, what would be the issue with having the code there by default?  I think 
rather than any new command line option, we automatically use 
`openat2+RESOLVE_IN_ROOT` if the process is run as a nonzero uid.

> Also, I don't see it as a sandbox feature, as Stefan mentioned, a compromised
> process can call openat2() without RESOLVE_IN_ROOT. 

I'm a bit skeptical honestly about how secure the existing namespace code is 
against a compromised virtiofsd process.  The primary worry is guest filesystem 
traversals, right?  openat2+RESOLVE_IN_ROOT addresses that.  Plus being in Rust 
makes this dramatically safer.

> I did some test with
> Landlock to lock virtiofsd inside the shared directory, but IIRC it requires a
> kernel 5.13

But yes, landlock and other things make sense, I just don't see these things as 
strongly linked.  IOW we shouldn't in my opinion block unprivileged virtiofsd 
on more sandboxing than openat2 already gives us.

Re: [PATCH v2] target/arm/kvm: Retry KVM_CREATE_VM call if it fails EINTR

2022-09-27 Thread Peter Maydell

On Tue, 27 Sept 2022 at 18:07, Eric Auger  wrote:
>
> Hi Peter,
>
> On 9/27/22 18:49, Peter Maydell wrote:
> > Occasionally the KVM_CREATE_VM ioctl can return EINTR, even though
> > there is no pending signal to be taken. In commit 94ccff13382055
> > we added a retry-on-EINTR loop to the KVM_CREATE_VM call in the
> > generic KVM code. Adopt the same approach for the use of the
> > ioctl in the Arm-specific KVM code (where we use it to create a
> > scratch VM for probing for various things).
> >
> > For more information, see the mailing list thread:
> > https://lore.kernel.org/qemu-devel/8735e0s1zw.wl-...@kernel.org/
> >
> > Reported-by: Vitaly Chikunov 
> > Signed-off-by: Peter Maydell 
> > ---
> > The view in the thread seems to be that this is a kernel bug (because
> > in QEMU's case there shouldn't be a signal to be delivered at this
> > point because of our signal handling strategy); so I've adopted the
> > same "just retry-on-EINTR for this specific ioctl" approach that
> > commit 94ccff13 did, rather than, for instance, something wider like
> > "make kvm_ioctl() and friends always retry on EINTR".
> >
> > v2: correctly check for -1 and errno is EINTR...
> > ---
> >  target/arm/kvm.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> > index e5c1bd50d29..356199c9e25 100644
> > --- a/target/arm/kvm.c
> > +++ b/target/arm/kvm.c
> > @@ -79,7 +79,9 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
> > *cpus_to_try,
> >  if (max_vm_pa_size < 0) {
> >  max_vm_pa_size = 0;
> >  }
> > -vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size);
> > +do {
> > +vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size);
> > +} while (vmfd == -1 && errno == -EINTR);
> shouldn't it be errno == EINTR?

Augh. Yes.

-- PMM

[PATCH v2] linux-user: Add guest memory layout to exception dump

2022-09-27 Thread Helge Deller

When the emulation stops with a hard exception it's very useful for
debugging purposes to dump the current guest memory layout (for an
example see /proc/self/maps) beside the CPU registers.

The open_self_maps() function provides such a memory dump, but since
it's located in the syscall.c file, various changes (add #includes, make
this function externally visible, ...) are needed to be able to call it
from the existing EXCP_DUMP() macro.

This patch takes another approach by re-defining EXCP_DUMP() to call
target_exception_dump(), which is in syscall.c, consolidates the log
print functions and allows to add the call to dump the memory layout.

Beside a reduced code footprint, this approach keeps the changes across
the various callers minimal, and keeps EXCP_DUMP() highlighted as
important macro/function.

Signed-off-by: Helge Deller 

---

v2:
Based on feedback by Philippe Mathieu-Daudé, renamed the two functions
to excp_dump_file() and target_exception_dump(), and #define'ed
EXCP_DUMP() to target_exception_dump().
I intentionally did not replace all occurences of EXCP_DUMP() by
target_exception_dump() as I think it's unneccesary and not beneficial.
If this is really wished, I will send a v3.


diff --git a/linux-user/cpu_loop-common.h b/linux-user/cpu_loop-common.h
index 36ff5b14f2..e644d2ef90 100644
--- a/linux-user/cpu_loop-common.h
+++ b/linux-user/cpu_loop-common.h
@@ -23,18 +23,9 @@
 #include "exec/log.h"
 #include "special-errno.h"

-#define EXCP_DUMP(env, fmt, ...)\
-do {\
-CPUState *cs = env_cpu(env);\
-fprintf(stderr, fmt , ## __VA_ARGS__);  \
-fprintf(stderr, "Failing executable: %s\n", exec_path); \
-cpu_dump_state(cs, stderr, 0);  \
-if (qemu_log_separate()) {  \
-qemu_log(fmt, ## __VA_ARGS__);  \
-qemu_log("Failing executable: %s\n", exec_path);\
-log_cpu_state(cs, 0);   \
-}   \
-} while (0)
+void target_exception_dump(CPUArchState *env, const char *fmt, int code);
+#define EXCP_DUMP(env, fmt, code) \
+target_exception_dump(env, fmt, code)

 void target_cpu_copy_regs(CPUArchState *env, struct target_pt_regs *regs);
 #endif
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 2e954d8dbd..7d29c4c396 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -158,6 +158,7 @@
 #include "qapi/error.h"
 #include "fd-trans.h"
 #include "tcg/tcg.h"
+#include "cpu_loop-common.h"

 #ifndef CLONE_IO
 #define CLONE_IO0x8000  /* Clone io context */
@@ -8144,6 +8145,33 @@ static int is_proc_myself(const char *filename, const 
char *entry)
 return 0;
 }

+static void excp_dump_file(FILE *logfile, CPUArchState *env,
+  const char *fmt, int code)
+{
+if (logfile) {
+CPUState *cs = env_cpu(env);
+
+fprintf(logfile, fmt, code);
+fprintf(logfile, "Failing executable: %s\n", exec_path);
+cpu_dump_state(cs, logfile, 0);
+open_self_maps(env, fileno(logfile));
+}
+}
+
+void target_exception_dump(CPUArchState *env, const char *fmt, int code)
+{
+/* dump to console */
+excp_dump_file(stderr, env, fmt, code);
+
+/* dump to log file */
+if (qemu_log_separate()) {
+FILE *logfile = qemu_log_trylock();
+
+excp_dump_file(logfile, env, fmt, code);
+qemu_log_unlock(logfile);
+}
+}
+
 #if HOST_BIG_ENDIAN != TARGET_BIG_ENDIAN || \
 defined(TARGET_SPARC) || defined(TARGET_M68K) || defined(TARGET_HPPA)
 static int is_proc(const char *filename, const char *entry)

Re: Should we maybe move Cirrus-CI jobs away from Gitlab again?

2022-09-27 Thread Stefan Hajnoczi

On Tue, 27 Sept 2022 at 11:54, Daniel P. Berrangé  wrote:
>
> On Tue, Sep 27, 2022 at 11:44:45AM -0400, Stefan Hajnoczi wrote:
> > On Tue, 27 Sept 2022 at 05:02, Thomas Huth  wrote:
> > > now that Gitlab is giving us pressure on the amount of free CI minutes, I
> > > wonder whether we should maybe move the Cirrus-CI jobs out of the 
> > > gitlab-CI
> > > dashboard again? We could add the jobs to our .cirrus-ci.yml file instead,
> > > like we did it in former times...
> > >
> > > Big advantage would be of course that the time for those jobs would not
> > > count in the Gitlab-CI minutes anymore. Disadvantage is of course that 
> > > they
> > > do not show up in the gitlab-CI dashboard anymore, so there is no more
> > > e-mail notification about failed jobs, and you have to push to github, 
> > > too,
> > > and finally check the results manually on cirrus-ci.com ...
> >
> > My understanding is that .gitlab-ci.d/cirrus.yml uses a GitLab CI job
> > to run the cirrus-run container image that forwards jobs to Cirrus-CI.
> > So GitLab CI resources are consumed waiting for Cirrus-CI to finish.
> >
> > This shouldn't affect gitlab.com/qemu-project where there are private
> > runners that do not consume GitLab CI minutes.
> >
> > Individual developers are affected though because they most likely
> > rely on the GitLab shared runner minutes quota.
>
> NB, none of the jobs should ever be run automatically anymore in
> QEMU CI pipelines. It always requires the maintainer to set the
> env var when pushing to git, to explicitly create a pipeline.
> You can then selectively start each individual job as desired.

Cirrus CI is not automatically started when pushing to a personal
GitLab repo? If starting it requires manual action anyway then I think
nothing needs to be changed here.

Stefan

Re: virtiofsd: Any reason why there's not an "openat2" sandbox mode?

2022-09-27 Thread German Maglione

On Tue, Sep 27, 2022 at 6:57 PM Vivek Goyal  wrote:
>
> On Tue, Sep 27, 2022 at 12:37:15PM -0400, Vivek Goyal wrote:
> > On Fri, Sep 09, 2022 at 05:24:03PM -0400, Colin Walters wrote:
> > > We previously had a chat here 
> > > https://lore.kernel.org/all/348d4774-bd5f-4832-bd7e-a21491fda...@www.fastmail.com/T/
> > > around virtiofsd and privileges and the case of trying to run virtiofsd 
> > > inside an unprivileged (Kubernetes) container.
> > >
> > > Right now we're still using 9p, and it has bugs (basically it seems like 
> > > the 9p inode flushing callback tries to allocate memory to send an RPC, 
> > > and this causes OOM problems)
> > > https://github.com/coreos/coreos-assembler/issues/1812
> > >
> > > Coming back to this...as of lately in Linux, there's support for strongly 
> > > isolated filesystem access via openat2():
> > > https://lwn.net/Articles/796868/
> > >
> > > Is there any reason we couldn't do an -o sandbox=openat2 ?  This operates 
> > > without any privileges at all, and should be usable (and secure enough) 
> > > in our use case.
> >
> > [ cc virtio-fs-list, german, sergio ]
> >
> > Hi Colin,
> >
> > Using openat2(RESOLVE_IN_ROOT) (if kernel is new enough), sounds like a
> > good idea. We talked about it few times but nobody ever wrote a patch to
> > implement it.
> >
> > And it probably makes sense with all the sandboxes (chroot(), namespaces).
> >
> > I am wondering that it probably should not be a new sandbox mode at all.
> > It probably should be the default if kernel offers openat2() syscall.
> >
> > Now all the development has moved to rust virtiofsd.
> >
> > https://gitlab.com/virtio-fs/virtiofsd
> >
> > C version of virtiofsd is just seeing small critical fixes.
> >
> > And rust version allows running unprivileged (inside a user namespace).
> > German is also working on allowing running unprivileged without
> > user namespaces but this will not allow arbitrary uid/gid switching.
> >
> > https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/136
> >
> > If one wants to run unprivileged and also do arbitrary uid/gid switching,
> > then you need to use user namepsaces and map a range of subuid/subgid
> > into the user namepsace virtiofsd is running in.
> >
> > If possible, please try to use rust virtiofsd for your situation. Its
> > already packaged for fedora.
> >
> > Coming back to original idea of using openat2(), I think we should
> > probably give it a try in rust virtiofsd and if it works, it should
> > work across all the sandboxing modes.
>
> Thinking more about it, enabling openat2() usage conditionally based on
> some option probably is not a bad idea. I was assuming that using
> openat2() by default will not break any of the existing use cases. But
> I am not sure. I have burnt my fingers so many times and had to back
> out on default settings that enabling usage of openat2() conditionally
> will probably be a safer choice. :-)
>

I could work on this for the next major version and see if anything breaks.
But I prefer to add this as a compilation feature, instead of a command line
option that we will then have to maintain for a while.

Also, I don't see it as a sandbox feature, as Stefan mentioned, a compromised
process can call openat2() without RESOLVE_IN_ROOT. I did some test with
Landlock to lock virtiofsd inside the shared directory, but IIRC it requires a
kernel 5.13

Cheers,
-- 
German

1 2 3 4 5 >

1 - 100 of 434 matches

Mail list logo