date:20220404

Re: [PATCH v4 2/2] Added parameter to take screenshot with screendump as PNG

2022-04-04 Thread Kshitij Suri




On 01/04/22 4:50 pm, Markus Armbruster wrote:

Dave, please have a look at the HMP compatibility issue in
hmp-command.hx below.

Kshitij Suri  writes:


Currently screendump only supports PPM format, which is un-compressed and not
standard.

If "standard" means "have to pay a standards organization $$$ to access
the spec", PPM is not standard.  If it means "widely supported", it
certainly is.  I'd drop "and not standard".  Suggestion, not demand.

Makes sense. Will modify it in the updated patch.



   Added a "format" parameter to qemu monitor screendump capabilites
to support PNG image capture using libpng. The param was added in QAPI schema
of screendump present in ui.json along with png_save() function which converts
pixman_image to PNG. HMP command equivalent was also modified to support the
feature.

Suggest to use imperative mood to describe the commit, and omit details
that aren't necessary here:

 Add a "format" parameter to QMP and HMP screendump command
   to support PNG image capture using libpng.

Yes, will reduce the verbosity of the commit message.

Example usage:
{ "execute": "screendump", "arguments": { "filename": "/tmp/image",
"format":"png" } }

Providing an example in the commit message is always nice, thanks!

Thank you!



Resolves: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.com_qemu-2Dproject_qemu_-2D_issues_718=DwIFaQ=s883GpUCOChKOHiocYtGcg=utjv19Ej9Fb0TB7_DX0o3faQ-OAm2ypPniPyqVSoj_w=oODILSxODcEhktuPJ-SfVt-MW867cpF_TvDe-WJyNRXx84FinSifhtp6-Racosb0=89nTa5MLAr16WtPfrm4aYkwWlPuRs6yuaD22dZTE_pM=

Signed-off-by: Kshitij Suri 

Reviewed-by: Daniel P. Berrangé 
---
  hmp-commands.hx|  11 ++---
  monitor/hmp-cmds.c |  12 +-
  qapi/ui.json   |  24 +--
  ui/console.c   | 101 +++--
  4 files changed, 136 insertions(+), 12 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8476277aa9..19b7cab595 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -244,11 +244,12 @@ ERST
  
  {

  .name   = "screendump",
-.args_type  = "filename:F,device:s?,head:i?",
-.params = "filename [device [head]]",
-.help   = "save screen from head 'head' of display device 'device' 
"
-  "into PPM image 'filename'",
-.cmd= hmp_screendump,
+.args_type  = "filename:F,format:s?,device:s?,head:i?",

Incompatible change: meaning of "screendump ONE TWO" changes from
filename=ONE, device=TWO to filename=ONE, format=TWO.

As HMP is not a stable interface, incompatible change is permissible.
But is this one wise?

Could we add the new argument at the end instead?

 .args_type  = "filename:F,device:s?,head:i?,format:s?",

Could we do *without* an argument, and derive the format from the
filename extension?  .png means format=png, anything else format=ppm.
Would be a bad idea for QMP.  Okay for HMP?

Should I go ahead with extracting format from filename provided for HMP?



+.params = "filename [format] [device [head]]",

This tells us that parameter format can be omitted like so

 screendump foo.ppm device-id

which isn't true.  Better: "filename [format [device [head]]".

Thank you will modify it!



+.help   = "save screen from head 'head' of display device 'device'"
+  "in specified format 'format' as image 'filename'."
+  "Currently only 'png' and 'ppm' formats are supported.",
+ .cmd= hmp_screendump,
  .coroutine  = true,
  },
  
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c

index 634968498b..2442bfa989 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1720,9 +1720,19 @@ hmp_screendump(Monitor *mon, const QDict *qdict)
  const char *filename = qdict_get_str(qdict, "filename");
  const char *id = qdict_get_try_str(qdict, "device");
  int64_t head = qdict_get_try_int(qdict, "head", 0);
+const char *input_format  = qdict_get_try_str(qdict, "format");
  Error *err = NULL;
+ImageFormat format;
  
-qmp_screendump(filename, id != NULL, id, id != NULL, head, );

+format = qapi_enum_parse(_lookup, input_format,
+  IMAGE_FORMAT_PPM, );
+if (err) {
+goto end;
+}
+
+qmp_screendump(filename, id != NULL, id, id != NULL, head,
+   input_format != NULL, format, );
+end:
  hmp_handle_error(mon, err);
  }
  
diff --git a/qapi/ui.json b/qapi/ui.json

index 664da9e462..24371fce05 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -157,12 +157,27 @@
  ##
  { 'command': 'expire_password', 'boxed': true, 'data': 
'ExpirePasswordOptions' }
  
+##

+# @ImageFormat:
+#
+# Supported image format types.
+#
+# @png: PNG format
+#
+# @ppm: PPM format
+#
+# Since: 7.1
+#
+##
+{ 'enum': 'ImageFormat',
+  'data': ['ppm', 'png'] }
+
  ##
  # @screendump:
  #
-# Write a PPM of the VGA screen to a file.
+# Capture the contents

Re: [RFC PATCH] python: add qmp-send program to send raw qmp commands to qemu

2022-04-04 Thread Markus Armbruster

Daniel P. Berrangé  writes:

> On Wed, Mar 16, 2022 at 10:54:55AM +0100, Damien Hedde wrote:
>> It takes an input file containing raw qmp commands (concatenated json
>> dicts) and send all commands one by one to a qmp server. When one
>> command fails, it exits.
>> 
>> As a convenience, it can also wrap the qemu process to avoid having
>> to start qemu in background. When wrapping qemu, the program returns
>> only when the qemu process terminates.
>> 
>> Signed-off-by: Damien Hedde 

[...]

>> I name that qmp-send as Daniel proposed, maybe qmp-test matches better
>> what I'm doing there ?
>
> 'qmp-test' is a use case specific name. I think it is better to
> name it based on functionality provided rather than anticipated
> use case, since use cases evolve over time, hence 'qmp-send'.

Well, it doesn't just send, it also receives.

qmpcat, like netcat and socat?

[...]

Re: ping [PATCH-for-7.0 v2] qga/vss-win32: fix compilation with clang++

2022-04-04 Thread Helge Konetzka


ping

https://lore.kernel.org/qemu-devel/39400817-3dc9-516d-9096-bc1f68862...@zapateado.de/
https://patchew.org/QEMU/39400817-3dc9-516d-9096-bc1f68862...@zapateado.de/

Am 16.03.22 um 14:54 schrieb Helge Konetzka:

This fixes:

qga/vss-win32/install.cpp:49:24: error: cannot initialize a variable of
type 'char *' with an rvalue of type 'const char *'
     char *msg = NULL, *nul = strchr(text, '(');
    ^ ~

Signed-off-by: Helge Konetzka 
Reviewed-by: Marc-André Lureau 
Reviewed-by: Philippe Mathieu-Daudé 
---
Compiling with clang++ of msys2 toolchain clang64 leads to

[1445/1747] Compiling C++ object 
qga/vss-win32/qga-vss.dll.p/install.cpp.obj

FAILED: qga/vss-win32/qga-vss.dll.p/install.cpp.obj
...
qga/vss-win32/install.cpp:49:24: error: cannot initialize a variable of 
type 'char *' with an rvalue of type 'const char *'

     char *msg = NULL, *nul = strchr(text, '(');
    ^ ~
1 error generated.
ninja: build stopped: subcommand failed.
make: *** [Makefile:163: run-ninja] Error 1
==> ERROR: A failure occurred in build().
     Aborting...
---
  qga/vss-win32/install.cpp | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/qga/vss-win32/install.cpp b/qga/vss-win32/install.cpp
index 8076efe3cb..b57508fbe0 100644
--- a/qga/vss-win32/install.cpp
+++ b/qga/vss-win32/install.cpp
@@ -46,7 +46,8 @@ void errmsg(DWORD err, const char *text)
   * If text doesn't contains '(', negative precision is given, 
which is

   * treated as though it were missing.
   */
-    char *msg = NULL, *nul = strchr(text, '(');
+    char *msg = NULL;
+    const char *nul = strchr(text, '(');
  int len = nul ? nul - text : -1;

  FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |

[PATCH v3] hw/misc: applesmc: use host osk as default on macs

2022-04-04 Thread Pedro Tôrres

From: Pedro Tôrres 

When running on a Mac, QEMU is able to get the host OSK and use it as
the default value for the AppleSMC device. The OSK query operation
doesn't require administrator privileges and can be executed by any user
on the system. This patch is based on Phil Dennis-Jordan's description
of the process for reading OSK from SCM on macOS:
https://lists.nongnu.org/archive/html/qemu-devel/2021-10/msg02843.html

In the future, this could also be extended to work on Linux and Windows
when running on Macs. Just implement the applesmc_read_osk function for
those platforms.

The Apple SMC driver for Linux is currently being rewritten by Hector
Martin as part of the effort to bring Linux to Macs with Apple Silicon
(Asahi Linux). When the new driver gets merged to the Linux Kernel, it
will be a good time to extend this to work with it.

Signed-off-by: Pedro Tôrres 
---
 hw/misc/applesmc.c | 75 --
 1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/hw/misc/applesmc.c b/hw/misc/applesmc.c
index 81cd6b6423..c95e038bd2 100644
--- a/hw/misc/applesmc.c
+++ b/hw/misc/applesmc.c
@@ -5,6 +5,7 @@
  *
  *  Authors: Alexander Graf 
  *   Susanne Graf 
+ *   Pedro Tôrres 
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -28,8 +29,16 @@
  * This driver was mostly created by looking at the Linux AppleSMC driver
  * implementation and does not support IRQ.
  *
+ * Reading OSK from SCM on macOS was implemented based on Phil Dennis-Jordan's
+ * description of the process:
+ * https://lists.nongnu.org/archive/html/qemu-devel/2021-10/msg02843.html
+ *
  */
 
+#if defined(__APPLE__) && defined(__MACH__)
+#include 
+#endif
+
 #include "qemu/osdep.h"
 #include "hw/isa/isa.h"
 #include "hw/qdev-properties.h"
@@ -312,9 +321,62 @@ static const MemoryRegionOps applesmc_err_io_ops = {
 },
 };
 
+static bool applesmc_read_osk(uint8_t *osk)
+{
+#if defined(__APPLE__) && defined(__MACH__)
+struct AppleSMCParams {
+uint32_t key;
+uint8_t __pad0[16];
+uint8_t result;
+uint8_t __pad1[7];
+uint32_t size;
+uint8_t __pad2[10];
+uint8_t data8;
+uint8_t __pad3[5];
+uint8_t output[32];
+};
+
+io_service_t svc;
+io_connect_t conn;
+kern_return_t ret;
+size_t size = sizeof(struct AppleSMCParams);
+struct AppleSMCParams params_in = { .size = 32, .data8 = 5 };
+struct AppleSMCParams params_out = {};
+
+svc = IOServiceGetMatchingService(0, IOServiceMatching("AppleSMC"));
+if (svc == 0) {
+return false;
+}
+
+ret = IOServiceOpen(svc, mach_task_self(), 0, );
+if (ret != 0) {
+return false;
+}
+
+for (params_in.key = 'OSK0'; params_in.key <= 'OSK1'; ++params_in.key) {
+ret = IOConnectCallStructMethod(conn, 2, _in, size, 
_out, );
+if (ret != 0) {
+return false;
+}
+
+if (params_out.result != 0) {
+return false;
+}
+memcpy(osk, params_out.output, params_in.size);
+
+osk += params_in.size;
+}
+
+return true;
+#else
+return false;
+#endif
+}
+
 static void applesmc_isa_realize(DeviceState *dev, Error **errp)
 {
 AppleSMCState *s = APPLE_SMC(dev);
+bool valid_osk = false;
 
 memory_region_init_io(>io_data, OBJECT(s), _data_io_ops, s,
   "applesmc-data", 1);
@@ -331,8 +393,17 @@ static void applesmc_isa_realize(DeviceState *dev, Error 
**errp)
 isa_register_ioport(>parent_obj, >io_err,
 s->iobase + APPLESMC_ERR_PORT);
 
-if (!s->osk || (strlen(s->osk) != 64)) {
-warn_report("Using AppleSMC with invalid key");
+if (s->osk) {
+valid_osk = strlen(s->osk) == 64;
+} else {
+valid_osk = applesmc_read_osk((uint8_t *) default_osk);
+if (valid_osk) {
+warn_report("Using AppleSMC with host OSK");
+s->osk = default_osk;
+}
+}
+if (!valid_osk) {
+warn_report("Using AppleSMC with invalid OSK");
 s->osk = default_osk;
 }
 
-- 
2.32.0 (Apple Git-132)

RE: [PATCH v1] ui/gtk-egl: Check for a valid context before making EGL calls

2022-04-04 Thread Kasireddy, Vivek

Hi Marc-Andre,

> 
> Hi
> 
> On Mon, Mar 7, 2022 at 10:00 PM Kasireddy, Vivek
>  wrote:
> >
> > Hi Marc-Andre,
> >
> > >
> > > Hi Vivek
> > >
> > > On Mon, Mar 7, 2022 at 8:39 AM Vivek Kasireddy
> > >  wrote:
> > > >
> > > > Since not all listeners (i.e VirtualConsoles) of GL events have
> > > > a valid EGL context, make sure that there is a valid context
> > > > before making EGL calls.
> > > >
> > > > This fixes the following crash seen while launching the VM with
> > > > "-device virtio-gpu-pci,max_outputs=1,blob=true -display gtk,gl=on"
> > > >
> > > > No provider of eglCreateImageKHR found.  Requires one of:
> > > > EGL_KHR_image
> > > > EGL_KHR_image_base
> > > >
> > > > Fixes: 7cc712e9862ff ("ui: dispatch GL events to all listeners")
> > >
> > > I am not able to reproduce on current master.
> > [Kasireddy, Vivek] I can still see it with current master. I think this 
> > issue
> > is only seen when running Qemu in an Xorg based Host environment and
> > cannot be reproduced in a Wayland based environment -- as Qemu UI
> > uses the GLArea widget in the Wayland case where the EGL context
> > is managed by GTK.
> >
> > >
> > > Isn't it fixed with commit a9fbce5e9 ("ui/console: fix crash when
> > > using gl context with non-gl listeners") ?
> > [Kasireddy, Vivek] No, it unfortunately does not fix the issue I am seeing. 
> > In
> > my case, there are three VirtualConsoles created ("parallel0", 
> > "compatmonitor0",
> > "virtio-gpu-pci") and all three of them seem to have a valid 
> > dpy_gl_scanout_dmabuf()
> > but only virtio-gpu-pci has a valid EGL context.
> >
> > >
> > > Could you also check after "[PATCH v3 00/12] GL & D-Bus display related 
> > > fixes" ?
> > [Kasireddy, Vivek] I can check but I don't think this issue can be fixed in 
> > ui/console.c
> > as all three VirtualConsoles pass the console_has_gl() check and one of the 
> > only things
> > that distinguishes them is whether they have a valid EGL context.
> >
> 
> Under X11, I get the same error on v6.2.0 and master:
> qemu-system-x86_64  -m 4G -object
> memory-backend-memfd,id=mem,size=4G,share=on -machine
> q35,accel=kvm,memory-backend=mem -device
> virtio-gpu-pci,max_outputs=1,blob=true -display gtk,gl=on -cdrom
> rawhide.iso
> No provider of eglCreateImageKHR found.  Requires one of:
> EGL_KHR_image
> EGL_KHR_image_base
> 
> Note that with virtio-gpu-gl-pci I get:
> qemu-system-x86_64: ../src/dispatch_common.c:868:
> epoxy_get_proc_address: Assertion `0 && "Couldn't find current GLX or
> EGL context.\n"' failed.
[Kasireddy, Vivek] It looks like this particular error and the one I saw are
both resolved by this commit:
Author: Akihiko Odaki 
Date:   Sat Mar 26 01:12:16 2022 +0900

ui/console: Check console before emitting GL event

On a completely different note, I am wondering if you have any plan to
eventually integrate the Rust based Gtk4 client into Qemu source repo?
Or, is it going to stay out-of-tree even after it is no longer WIP?

Thanks,
Vivek

Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-04-04 Thread Andy Lutomirski

On Mon, Apr 4, 2022, at 10:06 AM, Sean Christopherson wrote:
> On Mon, Apr 04, 2022, Quentin Perret wrote:
>> On Friday 01 Apr 2022 at 12:56:50 (-0700), Andy Lutomirski wrote:
>> FWIW, there are a couple of reasons why I'd like to have in-place
>> conversions:
>> 
>>  - one goal of pKVM is to migrate some things away from the Arm
>>Trustzone environment (e.g. DRM and the likes) and into protected VMs
>>instead. This will give Linux a fighting chance to defend itself
>>against these things -- they currently have access to _all_ memory.
>>And transitioning pages between Linux and Trustzone (donations and
>>shares) is fast and non-destructive, so we really do not want pKVM to
>>regress by requiring the hypervisor to memcpy things;
>
> Is there actually a _need_ for the conversion to be non-destructive?  
> E.g. I assume
> the "trusted" side of things will need to be reworked to run as a pKVM 
> guest, at
> which point reworking its logic to understand that conversions are 
> destructive and
> slow-ish doesn't seem too onerous.
>
>>  - it can be very useful for protected VMs to do shared=>private
>>conversions. Think of a VM receiving some data from the host in a
>>shared buffer, and then it wants to operate on that buffer without
>>risking to leak confidential informations in a transient state. In
>>that case the most logical thing to do is to convert the buffer back
>>to private, do whatever needs to be done on that buffer (decrypting a
>>frame, ...), and then share it back with the host to consume it;
>
> If performance is a motivation, why would the guest want to do two 
> conversions
> instead of just doing internal memcpy() to/from a private page?  I 
> would be quite
> surprised if multiple exits and TLB shootdowns is actually faster, 
> especially at
> any kind of scale where zapping stage-2 PTEs will cause lock contention 
> and IPIs.

I don't know the numbers or all the details, but this is arm64, which is a 
rather better architecture than x86 in this regard.  So maybe it's not so bad, 
at least in very simple cases, ignoring all implementation details.  (But see 
below.)  Also the systems in question tend to have fewer CPUs than some of the 
massive x86 systems out there.

If we actually wanted to support transitioning the same page between shared and 
private, though, we have a bit of an awkward situation.  Private to shared is 
conceptually easy -- do some bookkeeping, reconstitute the direct map entry, 
and it's done.  The other direction is a mess: all existing uses of the page 
need to be torn down.  If the page has been recently used for DMA, this 
includes IOMMU entries.

Quentin: let's ignore any API issues for now.  Do you have a concept of how a 
nondestructive shared -> private transition could work well, even in principle? 
 The best I can come up with is a special type of shared page that is not 
GUP-able and maybe not even mmappable, having a clear option for transitions to 
fail, and generally preventing the nasty cases from happening in the first 
place.

Maybe there could be a special mode for the private memory fds in which 
specific pages are marked as "managed by this fd but actually shared".  pread() 
and pwrite() would work on those pages, but not mmap().  (Or maybe mmap() but 
the resulting mappings would not permit GUP.)  And transitioning them would be 
a special operation on the fd that is specific to pKVM and wouldn't work on TDX 
or SEV.

Hmm.  Sean and Chao, are we making a bit of a mistake by making these fds 
technology-agnostic?  That is, would we want to distinguish between a TDX 
backing fd, a SEV backing fd, a software-based backing fd, etc?  API-wise this 
could work by requiring the fd to be bound to a KVM VM instance and possibly 
even configured a bit before any other operations would be allowed.

(Destructive transitions nicely avoid all the nasty cases.  If something is 
still pinning a shared page when it's "transitioned" to private (really just 
replaced with a new page), then the old page continues existing for as long as 
needed as a separate object.)

Re: [PATCH v9 18/45] hw/cxl/device: Implement MMIO HDM decoding (8.2.5.12)

2022-04-04 Thread Tong Zhang




> On Apr 4, 2022, at 8:14 AM, Jonathan Cameron via  
> wrote:
> 
> From: Ben Widawsky 
> 
> A device's volatile and persistent memory are known Host Defined Memory
> (HDM) regions. The mechanism by which the device is programmed to claim
> the addresses associated with those regions is through dedicated logic
> known as the HDM decoder. In order to allow the OS to properly program
> the HDMs, the HDM decoders must be modeled.
> 
> There are two ways the HDM decoders can be implemented, the legacy
> mechanism is through the PCIe DVSEC programming from CXL 1.1 (8.1.3.8),
> and MMIO is found in 8.2.5.12 of the spec. For now, 8.1.3.8 is not
> implemented.
> 
> Much of CXL device logic is implemented in cxl-utils. The HDM decoder
> however is implemented directly by the device implementation.
> Whilst the implementation currently does no validity checks on the
> encoder set up, future work will add sanity checking specific to
> the type of cxl component.
> 
> Signed-off-by: Ben Widawsky 
> Co-developed-by: Jonathan Cameron 
> Signed-off-by: Jonathan Cameron 
> Reviewed-by: Alex Bennée 
> ---
> hw/mem/cxl_type3.c | 55 ++
> 1 file changed, 55 insertions(+)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 329a6ea2a9..5c93fbbd9b 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -50,6 +50,48 @@ static void build_dvsecs(CXLType3Dev *ct3d)
>GPF_DEVICE_DVSEC_REVID, dvsec);
> }
> 
> +static void hdm_decoder_commit(CXLType3Dev *ct3d, int which)
> +{
> +ComponentRegisters *cregs = >cxl_cstate.crb;
> +uint32_t *cache_mem = cregs->cache_mem_registers;
> +
> +assert(which == 0);
> +
> +/* TODO: Sanity checks that the decoder is possible */
> +ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMIT, 0);
> +ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, ERR, 0);
> +
> +ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMITTED, 1);
> +}
> +
> +static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
> +   unsigned size)
> +{
> +CXLComponentState *cxl_cstate = opaque;
> +ComponentRegisters *cregs = _cstate->crb;
> +CXLType3Dev *ct3d = container_of(cxl_cstate, CXLType3Dev, cxl_cstate);
> +uint32_t *cache_mem = cregs->cache_mem_registers;
> +bool should_commit = false;
> +int which_hdm = -1;
> +
> +assert(size == 4);
> +g_assert(offset <= CXL2_COMPONENT_CM_REGION_SIZE);
> +

Looks like this will allow offset == CXL2_COMPONENT_CM_REGION_SIZE to pass the 
check, and cause a buffer overrun.
Shouldn’t this be g_assert(offset< CXL2_COMPONENT_CM_REGION_SIZE)?
We also need to make sure (offset + 4<= CXL2_COMPONENT_CM_REGION_SIZE).
Or maybe we just need offset +4 <= CXL2_COMPONENT_CM_REGION_SIZE here, if 
offset < CXL2_COMPONENT_CM_REGION_SIZE is already checked somewhere else.

> +switch (offset) {
> +case A_CXL_HDM_DECODER0_CTRL:
> +should_commit = FIELD_EX32(value, CXL_HDM_DECODER0_CTRL, COMMIT);
> +which_hdm = 0;
> +break;
> +default:
> +break;
> +}
> +
> +stl_le_p((uint8_t *)cache_mem + offset, value);
> +if (should_commit) {
> +hdm_decoder_commit(ct3d, which_hdm);
> +}
> +}
> +
> static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> {
> MemoryRegion *mr;
> @@ -93,6 +135,9 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
> ct3d->cxl_cstate.pdev = pci_dev;
> build_dvsecs(ct3d);
> 
> +regs->special_ops = g_new0(MemoryRegionOps, 1);
> +regs->special_ops->write = ct3d_reg_write;
> +
> cxl_component_register_block_init(OBJECT(pci_dev), cxl_cstate,
>   TYPE_CXL_TYPE3);
> 
> @@ -107,6 +152,15 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
>  >cxl_dstate.device_registers);
> }
> 
> +static void ct3_exit(PCIDevice *pci_dev)
> +{
> +CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
> +CXLComponentState *cxl_cstate = >cxl_cstate;
> +ComponentRegisters *regs = _cstate->crb;
> +
> +g_free(regs->special_ops);
> +}
> +
> static void ct3d_reset(DeviceState *dev)
> {
> CXLType3Dev *ct3d = CXL_TYPE3(dev);
> @@ -128,6 +182,7 @@ static void ct3_class_init(ObjectClass *oc, void *data)
> PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
> 
> pc->realize = ct3_realize;
> +pc->exit = ct3_exit;
> pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
> pc->vendor_id = PCI_VENDOR_ID_INTEL;
> pc->device_id = 0xd93; /* LVF for now */
> -- 
> 2.32.0
> 
> 
>

Re: [RFC PATCH] python: add qmp-send program to send raw qmp commands to qemu

2022-04-04 Thread John Snow

On Wed, Mar 16, 2022 at 5:55 AM Damien Hedde  wrote:
>
> It takes an input file containing raw qmp commands (concatenated json
> dicts) and send all commands one by one to a qmp server. When one
> command fails, it exits.
>
> As a convenience, it can also wrap the qemu process to avoid having
> to start qemu in background. When wrapping qemu, the program returns
> only when the qemu process terminates.
>
> Signed-off-by: Damien Hedde 
> ---
>
> Hi all,
>
> Following our discussion, I've started this. What do you think ?
>
> I tried to follow Daniel's qmp-shell-wrap. I think it is
> better to have similar options (eg: logging). There is also room
> for factorizing code if we want to keep them aligned and ease
> maintenance.
>
> There are still some pylint issues (too many branches in main and it
> does not like my context manager if else line). But it's kind of a
> mess to fix theses so I think it's enough for a first version.

Yeah, don't worry about these. You can just tell pylint to shut up
while you prototype. Sometimes it's just not worth spending more time
on a more beautiful factoring. Oh well.

>
> I name that qmp-send as Daniel proposed, maybe qmp-test matches better
> what I'm doing there ?
>

I think I agree with Dan's response.

> Thanks,
> Damien
> ---
>  python/qemu/aqmp/qmp_send.py | 229 +++

I recommend putting this in qemu/util/qmp_send.py instead.

I'm in the process of pulling out the AQMP lib and hosting it
separately. Scripts like this I think should stay in the QEMU tree, so
moving it to util instead is probably best. Otherwise, I'll *really*
have to commit to the syntax, and that's probably a bigger hurdle than
you want to deal with.

>  scripts/qmp/qmp-send |  11 ++
>  2 files changed, 240 insertions(+)
>  create mode 100644 python/qemu/aqmp/qmp_send.py
>  create mode 100755 scripts/qmp/qmp-send
>
> diff --git a/python/qemu/aqmp/qmp_send.py b/python/qemu/aqmp/qmp_send.py
> new file mode 100644
> index 00..cbca1d0205
> --- /dev/null
> +++ b/python/qemu/aqmp/qmp_send.py
> @@ -0,0 +1,229 @@
> +#
> +# Copyright (C) 2022 Greensocs
> +#
> +# This work is licensed under the terms of the GNU GPL, version 2 or
> +# later.  See the COPYING file in the top-level directory.
> +#
> +
> +"""
> +usage: qmp-send [-h] [-f FILE] [-s SOCKET] [-v] [-p] [--wrap ...]
> +
> +Send raw qmp commands to qemu as long as they succeed. It either connects to 
> a
> +remote qmp server using the provided socket or wrap the qemu process. It 
> stops
> +sending the provided commands when a command fails (disconnection or error
> +response).
> +
> +optional arguments:
> +  -h, --helpshow this help message and exit
> +  -f FILE, --file FILE  Input file containing the commands
> +  -s SOCKET, --socket SOCKET
> +< UNIX socket path | TCP address:port >
> +  -v, --verbose Verbose (echo commands sent and received)
> +  -p, --pretty  Pretty-print JSON
> +  --wrap ...QEMU command line to invoke
> +
> +When qemu wrap option is used, this script waits for qemu to terminate but
> +never send any quit or kill command. This needs to be done manually.
> +"""
> +
> +import argparse
> +import contextlib
> +import json
> +import logging
> +import os
> +from subprocess import Popen
> +import sys
> +from typing import List, TextIO
> +
> +from qemu.aqmp import ConnectError, QMPError, SocketAddrT
> +from qemu.aqmp.legacy import (
> +QEMUMonitorProtocol,
> +QMPBadPortError,
> +QMPMessage,
> +)
> +
> +
> +LOG = logging.getLogger(__name__)
> +
> +
> +class QmpRawDecodeError(Exception):
> +"""
> +Exception for raw qmp decoding
> +
> +msg: exception message
> +lineno: input line of the error
> +colno: input column of the error
> +"""
> +def __init__(self, msg: str, lineno: int, colno: int):
> +self.msg = msg
> +self.lineno = lineno
> +self.colno = colno
> +super().__init__(f"{msg}: line {lineno} column {colno}")
> +
> +
> +class QMPSendError(QMPError):
> +"""
> +QMP Send Base error class.
> +"""
> +
> +
> +class QMPSend(QEMUMonitorProtocol):
> +"""
> +QMP Send class.
> +"""
> +def __init__(self, address: SocketAddrT,
> + pretty: bool = False,
> + verbose: bool = False,
> + server: bool = False):
> +super().__init__(address, server=server)
> +self._verbose = verbose
> +self._pretty = pretty
> +self._server = server
> +
> +def setup_connection(self) -> None:
> +"""Setup the connetion with the remote client/server."""
> +if self._server:
> +self.accept()
> +else:
> +self.connect()
> +
> +def _print(self, qmp_message: object) -> None:
> +jsobj = json.dumps(qmp_message,
> +   indent=4 if self._pretty else None,
> +   sort_keys=self._pretty)
> +

Re: [PULL 0/3] ppc queue

2022-04-04 Thread Peter Maydell

On Mon, 4 Apr 2022 at 15:38, Cédric Le Goater  wrote:
>
> The following changes since commit bc6ec396d471d9e4aae7e2ff8b72e11da9a97665:
>
>   Merge tag 'pull-request-2022-04-01' of https://gitlab.com/thuth/qemu into 
> staging (2022-04-02 09:36:07 +0100)
>
> are available in the Git repository at:
>
>   https://github.com/legoater/qemu/ tags/pull-ppc-20220404
>
> for you to fetch changes up to 0798da8df9fd917515c957ae918d6d979cf5f3fb:
>
>   linux-user/ppc: Narrow type of ccr in save_user_regs (2022-04-04 08:49:06 
> +0200)
>
> 
> ppc-7.0 queue:
>
> * Coverity fixes
> * Fix for a memory leak issue
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0
for any user-visible changes.

-- PMM

Re: [qemu.qmp PATCH 02/13] fork qemu.qmp from qemu.git

2022-04-04 Thread John Snow

On Mon, Apr 4, 2022 at 2:54 PM John Snow  wrote:
>
> On Fri, Apr 1, 2022 at 1:05 PM Kashyap Chamarthy  wrote:
> >
> > On Wed, Mar 30, 2022 at 02:24:13PM -0400, John Snow wrote:
> > > Split python/ from qemu.git, using these commands:
> > >
> > > > git subtree split -P python/ -b python-split-v3
> > > > mkdir ~/src/tmp
> > > > cd ~/src/tmp
> > > > git clone --no-local --branch python-split-v3 --single-branch ~/src/qemu
> > > > cd qemu
> > > > git filter-repo --path qemu/machine/   \
> > >   --path qemu/utils/ \
> > >   --path tests/iotests-mypy.sh   \
> > >   --path tests/iotests-pylint.sh \
> > >   --invert-paths
> > >
> > > This commit, however, only performs some minimum cleanup to reflect the
> > > deletion of the other subpackages. It is not intended to be exhaustive,
> > > and further edits are made in forthcoming commits.
> > >
> > > These fixes are broken apart into micro-changes to facilitate mailing
> > > list review subject-by-subject. They *could* be squashed into a single
> > > larger commit on merge if desired, but due to the nature of the fork,
> > > bisectability across the fork boundary is going to be challenging
> > > anyway. It may be better value to just leave these initial commits
> > > as-is.
> > >
> > > Signed-off-by: John Snow 
> > > ---
> > >  .gitignore |  2 +-
> > >  Makefile   | 16 
> > >  setup.cfg  | 24 +---
> > >  setup.py   |  2 +-
> > >  4 files changed, 11 insertions(+), 33 deletions(-)
> >
> > The changes here look fine to me (and thanks for making it a "micro
> > change").  I'll let sharper eyes than mine to give a closer look at the
> > `git filter-repo` surgery.  Although, that looks fine to me too.
> >
> > [...]
> >
> > >  .PHONY: distclean
> > >  distclean: clean
> > > - rm -rf qemu.egg-info/ .venv/ .tox/ $(QEMU_VENV_DIR) dist/
> > > + rm -rf qemu.qmp.egg-info/ .venv/ .tox/ $(QEMU_VENV_DIR) dist/
> > >   rm -f .coverage .coverage.*
> > >   rm -rf htmlcov/
> > > diff --git a/setup.cfg b/setup.cfg
> > > index e877ea5..4ffab73 100644
> > > --- a/setup.cfg
> > > +++ b/setup.cfg
> > > @@ -1,5 +1,5 @@
> > >  [metadata]
> > > -name = qemu
> > > +name = qemu.qmp
> > >  version = file:VERSION
> > >  maintainer = QEMU Developer Team
> >
> > In the spirit of patch 04 ("update maintainer metadata"), do you also
> > want to update here too? s/QEMU Developer Team/QEMU Project?
> >
>
> Good spot.

...Or, uh. That's exactly what I update in patch 04. Are you asking me
to fold in that change earlier? I'm confused now.

--js

Re: [qemu.qmp PATCH 02/13] fork qemu.qmp from qemu.git

2022-04-04 Thread John Snow

On Fri, Apr 1, 2022 at 1:05 PM Kashyap Chamarthy  wrote:
>
> On Wed, Mar 30, 2022 at 02:24:13PM -0400, John Snow wrote:
> > Split python/ from qemu.git, using these commands:
> >
> > > git subtree split -P python/ -b python-split-v3
> > > mkdir ~/src/tmp
> > > cd ~/src/tmp
> > > git clone --no-local --branch python-split-v3 --single-branch ~/src/qemu
> > > cd qemu
> > > git filter-repo --path qemu/machine/   \
> >   --path qemu/utils/ \
> >   --path tests/iotests-mypy.sh   \
> >   --path tests/iotests-pylint.sh \
> >   --invert-paths
> >
> > This commit, however, only performs some minimum cleanup to reflect the
> > deletion of the other subpackages. It is not intended to be exhaustive,
> > and further edits are made in forthcoming commits.
> >
> > These fixes are broken apart into micro-changes to facilitate mailing
> > list review subject-by-subject. They *could* be squashed into a single
> > larger commit on merge if desired, but due to the nature of the fork,
> > bisectability across the fork boundary is going to be challenging
> > anyway. It may be better value to just leave these initial commits
> > as-is.
> >
> > Signed-off-by: John Snow 
> > ---
> >  .gitignore |  2 +-
> >  Makefile   | 16 
> >  setup.cfg  | 24 +---
> >  setup.py   |  2 +-
> >  4 files changed, 11 insertions(+), 33 deletions(-)
>
> The changes here look fine to me (and thanks for making it a "micro
> change").  I'll let sharper eyes than mine to give a closer look at the
> `git filter-repo` surgery.  Although, that looks fine to me too.
>
> [...]
>
> >  .PHONY: distclean
> >  distclean: clean
> > - rm -rf qemu.egg-info/ .venv/ .tox/ $(QEMU_VENV_DIR) dist/
> > + rm -rf qemu.qmp.egg-info/ .venv/ .tox/ $(QEMU_VENV_DIR) dist/
> >   rm -f .coverage .coverage.*
> >   rm -rf htmlcov/
> > diff --git a/setup.cfg b/setup.cfg
> > index e877ea5..4ffab73 100644
> > --- a/setup.cfg
> > +++ b/setup.cfg
> > @@ -1,5 +1,5 @@
> >  [metadata]
> > -name = qemu
> > +name = qemu.qmp
> >  version = file:VERSION
> >  maintainer = QEMU Developer Team
>
> In the spirit of patch 04 ("update maintainer metadata"), do you also
> want to update here too? s/QEMU Developer Team/QEMU Project?
>

Good spot.

> FWIW:
>
> Reviewed-by: Kashyap Chamarthy 
>
> [...]
>
> --
> /kashyap
>

[PATCH v5 8/9] s390x/pci: let intercept devices have separate PCI groups

2022-04-04 Thread Matthew Rosato

Let's use the reserved pool of simulated PCI groups to allow intercept
devices to have separate groups from interpreted devices as some group
values may be different. If we run out of simulated PCI groups, subsequent
intercept devices just get the default group.
Furthermore, if we encounter any PCI groups from hostdevs that are marked
as simulated, let's just assign them to the default group to avoid
conflicts between host simulated groups and our own simulated groups.

Reviewed-by: Pierre Morel 
Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-bus.c | 19 ++--
 hw/s390x/s390-pci-vfio.c| 40 ++---
 include/hw/s390x/s390-pci-bus.h |  6 -
 3 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 47918d2ce9..a222a8f4f7 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -748,13 +748,14 @@ static void s390_pci_iommu_free(S390pciState *s, PCIBus 
*bus, int32_t devfn)
 object_unref(OBJECT(iommu));
 }
 
-S390PCIGroup *s390_group_create(int id)
+S390PCIGroup *s390_group_create(int id, int host_id)
 {
 S390PCIGroup *group;
 S390pciState *s = s390_get_phb();
 
 group = g_new0(S390PCIGroup, 1);
 group->id = id;
+group->host_id = host_id;
 QTAILQ_INSERT_TAIL(>zpci_groups, group, link);
 return group;
 }
@@ -772,12 +773,25 @@ S390PCIGroup *s390_group_find(int id)
 return NULL;
 }
 
+S390PCIGroup *s390_group_find_host_sim(int host_id)
+{
+S390PCIGroup *group;
+S390pciState *s = s390_get_phb();
+
+QTAILQ_FOREACH(group, >zpci_groups, link) {
+if (group->id >= ZPCI_SIM_GRP_START && group->host_id == host_id) {
+return group;
+}
+}
+return NULL;
+}
+
 static void s390_pci_init_default_group(void)
 {
 S390PCIGroup *group;
 ClpRspQueryPciGrp *resgrp;
 
-group = s390_group_create(ZPCI_DEFAULT_FN_GRP);
+group = s390_group_create(ZPCI_DEFAULT_FN_GRP, ZPCI_DEFAULT_FN_GRP);
 resgrp = >zpci_group;
 resgrp->fr = 1;
 resgrp->dasm = 0;
@@ -825,6 +839,7 @@ static void s390_pcihost_realize(DeviceState *dev, Error 
**errp)
NULL, g_free);
 s->zpci_table = g_hash_table_new_full(g_int_hash, g_int_equal, NULL, NULL);
 s->bus_no = 0;
+s->next_sim_grp = ZPCI_SIM_GRP_START;
 QTAILQ_INIT(>pending_sei);
 QTAILQ_INIT(>zpci_devs);
 QTAILQ_INIT(>zpci_dma_limit);
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 4bf0a7e22d..985980f021 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -150,13 +150,18 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
 {
 struct vfio_info_cap_header *hdr;
 struct vfio_device_info_cap_zpci_group *cap;
+S390pciState *s = s390_get_phb();
 ClpRspQueryPciGrp *resgrp;
 VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+uint8_t start_gid = pbdev->zpci_fn.pfgid;
 
 hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
 
-/* If capability not provided, just use the default group */
-if (hdr == NULL) {
+/*
+ * If capability not provided or the underlying hostdev is simulated, just
+ * use the default group.
+ */
+if (hdr == NULL || pbdev->zpci_fn.pfgid >= ZPCI_SIM_GRP_START) {
 trace_s390_pci_clp_cap(vpci->vbasedev.name,
VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
 pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
@@ -165,11 +170,40 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
 }
 cap = (void *) hdr;
 
+/*
+ * For an intercept device, let's use an existing simulated group if one
+ * one was already created for other intercept devices in this group.
+ * If not, create a new simulated group if any are still available.
+ * If all else fails, just fall back on the default group.
+ */
+if (!pbdev->interp) {
+pbdev->pci_group = s390_group_find_host_sim(pbdev->zpci_fn.pfgid);
+if (pbdev->pci_group) {
+/* Use existing simulated group */
+pbdev->zpci_fn.pfgid = pbdev->pci_group->id;
+return;
+} else {
+if (s->next_sim_grp == ZPCI_DEFAULT_FN_GRP) {
+/* All out of simulated groups, use default */
+trace_s390_pci_clp_cap(vpci->vbasedev.name,
+   VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
+pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
+pbdev->pci_group = s390_group_find(ZPCI_DEFAULT_FN_GRP);
+return;
+} else {
+/* We can assign a new simulated group */
+pbdev->zpci_fn.pfgid = s->next_sim_grp;
+s->next_sim_grp++;
+/* Fall through to create the new sim group using CLP info */
+}
+}
+}
+
 /* See if the PCI group is already

[PATCH v5 7/9] s390x/pci: enable adapter event notification for interpreted devices

2022-04-04 Thread Matthew Rosato

Use the associated kvm ioctl operation to enable adapter event notification
and forwarding for devices when requested.  This feature will be set up
with or without firmware assist based upon the 'forwarding_assist' setting.

Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-bus.c | 20 ++---
 hw/s390x/s390-pci-inst.c| 40 +++--
 hw/s390x/s390-pci-kvm.c | 30 +
 include/hw/s390x/s390-pci-bus.h |  1 +
 include/hw/s390x/s390-pci-kvm.h | 14 
 5 files changed, 100 insertions(+), 5 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 9c02d31250..47918d2ce9 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
 rc = SCLP_RC_NO_ACTION_REQUIRED;
 break;
 default:
-if (pbdev->summary_ind) {
+if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
+/* Interpreted devices were using interrupt forwarding */
+s390_pci_kvm_aif_disable(pbdev);
+} else if (pbdev->summary_ind) {
 pci_dereg_irqs(pbdev);
 }
 if (pbdev->iommu->enabled) {
@@ -1078,6 +1081,7 @@ static void s390_pcihost_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 } else {
 DPRINTF("zPCI interpretation facilities missing.\n");
 pbdev->interp = false;
+pbdev->forwarding_assist = false;
 }
 }
 pbdev->iommu->dma_limit = s390_pci_start_dma_count(s, pbdev);
@@ -1086,11 +1090,13 @@ static void s390_pcihost_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 if (!pbdev->interp) {
 /* Do vfio passthrough but intercept for I/O */
 pbdev->fh |= FH_SHM_VFIO;
+pbdev->forwarding_assist = false;
 }
 } else {
 pbdev->fh |= FH_SHM_EMUL;
 /* Always intercept emulated devices */
 pbdev->interp = false;
+pbdev->forwarding_assist = false;
 }
 
 if (s390_pci_msix_init(pbdev) && !pbdev->interp) {
@@ -1240,7 +1246,10 @@ static void s390_pcihost_reset(DeviceState *dev)
 /* Process all pending unplug requests */
 QTAILQ_FOREACH_SAFE(pbdev, >zpci_devs, link, next) {
 if (pbdev->unplug_requested) {
-if (pbdev->summary_ind) {
+if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
+/* Interpreted devices were using interrupt forwarding */
+s390_pci_kvm_aif_disable(pbdev);
+} else if (pbdev->summary_ind) {
 pci_dereg_irqs(pbdev);
 }
 if (pbdev->iommu->enabled) {
@@ -1378,7 +1387,10 @@ static void s390_pci_device_reset(DeviceState *dev)
 break;
 }
 
-if (pbdev->summary_ind) {
+if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
+/* Interpreted devices were using interrupt forwarding */
+s390_pci_kvm_aif_disable(pbdev);
+} else if (pbdev->summary_ind) {
 pci_dereg_irqs(pbdev);
 }
 if (pbdev->iommu->enabled) {
@@ -1424,6 +1436,8 @@ static Property s390_pci_device_properties[] = {
 DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
 DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
 DEFINE_PROP_BOOL("interpret", S390PCIBusDevice, interp, true),
+DEFINE_PROP_BOOL("forwarding_assist", S390PCIBusDevice, forwarding_assist,
+ true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index c898c8abe9..c3a34da73d 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -1062,6 +1062,32 @@ static void fmb_update(void *opaque)
 timer_mod(pbdev->fmb_timer, t + pbdev->pci_group->zpci_group.mui);
 }
 
+static int mpcifc_reg_int_interp(S390PCIBusDevice *pbdev, ZpciFib *fib)
+{
+int rc;
+
+rc = s390_pci_kvm_aif_enable(pbdev, fib, pbdev->forwarding_assist);
+if (rc) {
+DPRINTF("Failed to enable interrupt forwarding\n");
+return rc;
+}
+
+return 0;
+}
+
+static int mpcifc_dereg_int_interp(S390PCIBusDevice *pbdev, ZpciFib *fib)
+{
+int rc;
+
+rc = s390_pci_kvm_aif_disable(pbdev);
+if (rc) {
+DPRINTF("Failed to disable interrupt forwarding\n");
+return rc;
+}
+
+return 0;
+}
+
 int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
 uintptr_t ra)
 {
@@ -1116,7 +1142,12 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, 
uint64_t fiba, uint8_t ar,
 
 switch (oc) {
 case ZPCI_MOD_FC_REG_INT:
-if (pbdev->summary_ind) {
+if (pbdev->interp) {
+if (mpcifc_reg_int_interp(pbdev, )) {
+cc = ZPCI_PCI_LS_ERR;
+s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
+}
+

[PATCH v5 6/9] s390x/pci: don't fence interpreted devices without MSI-X

2022-04-04 Thread Matthew Rosato

Lack of MSI-X support is not an issue for interpreted passthrough
devices, so let's let these in.  This will allow, for example, ISM
devices to be passed through -- but only when interpretation is
available and being used.

Reviewed-by: Thomas Huth 
Reviewed-by: Pierre Morel 
Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 156051e6e9..9c02d31250 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -1093,7 +1093,7 @@ static void s390_pcihost_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 pbdev->interp = false;
 }
 
-if (s390_pci_msix_init(pbdev)) {
+if (s390_pci_msix_init(pbdev) && !pbdev->interp) {
 error_setg(errp, "MSI-X support is mandatory "
"in the S390 architecture");
 return;
-- 
2.27.0

[PATCH v5 9/9] s390x/pci: reflect proper maxstbl for groups of interpreted devices

2022-04-04 Thread Matthew Rosato

The maximum supported store block length might be different depending
on whether the instruction is interpretively executed (firmware-reported
maximum) or handled via userspace intercept (host kernel API maximum).
Choose the best available value during group creation.

Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-vfio.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 985980f021..212dd053f7 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -213,7 +213,11 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
 resgrp->msia = cap->msi_addr;
 resgrp->mui = cap->mui;
 resgrp->i = cap->noi;
-resgrp->maxstbl = cap->maxstbl;
+if (pbdev->interp && hdr->version >= 2) {
+resgrp->maxstbl = cap->imaxstbl;
+} else {
+resgrp->maxstbl = cap->maxstbl;
+}
 resgrp->version = cap->version;
 resgrp->dtsm = ZPCI_DTSM;
 }
-- 
2.27.0

[PATCH v5 3/9] target/s390x: add zpci-interp to cpu models

2022-04-04 Thread Matthew Rosato

The zpci-interp feature is used to specify whether zPCI interpretation is
to be used for this guest.

Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-virtio-ccw.c  | 1 +
 target/s390x/cpu_features_def.h.inc | 1 +
 target/s390x/gen-features.c | 2 ++
 target/s390x/kvm/kvm.c  | 1 +
 4 files changed, 5 insertions(+)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 90480e7cf9..b190234308 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -805,6 +805,7 @@ static void ccw_machine_6_2_instance_options(MachineState 
*machine)
 static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V6_2 };
 
 ccw_machine_7_0_instance_options(machine);
+s390_cpudef_featoff_greater(14, 1, S390_FEAT_ZPCI_INTERP);
 s390_set_qemu_cpu_model(0x3906, 14, 2, qemu_cpu_feat);
 }
 
diff --git a/target/s390x/cpu_features_def.h.inc 
b/target/s390x/cpu_features_def.h.inc
index e86662bb3b..4ade3182aa 100644
--- a/target/s390x/cpu_features_def.h.inc
+++ b/target/s390x/cpu_features_def.h.inc
@@ -146,6 +146,7 @@ DEF_FEAT(SIE_CEI, "cei", SCLP_CPU, 43, "SIE: 
Conditional-external-interception f
 DEF_FEAT(DAT_ENH_2, "dateh2", MISC, 0, "DAT-enhancement facility 2")
 DEF_FEAT(CMM, "cmm", MISC, 0, "Collaborative-memory-management facility")
 DEF_FEAT(AP, "ap", MISC, 0, "AP instructions installed")
+DEF_FEAT(ZPCI_INTERP, "zpci-interp", MISC, 0, "zPCI interpretation")
 
 /* Features exposed via the PLO instruction. */
 DEF_FEAT(PLO_CL, "plo-cl", PLO, 0, "PLO Compare and load (32 bit in general 
registers)")
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 22846121c4..9db6bd545e 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -554,6 +554,7 @@ static uint16_t full_GEN14_GA1[] = {
 S390_FEAT_HPMA2,
 S390_FEAT_SIE_KSS,
 S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
+S390_FEAT_ZPCI_INTERP,
 };
 
 #define full_GEN14_GA2 EmptyFeat
@@ -650,6 +651,7 @@ static uint16_t default_GEN14_GA1[] = {
 S390_FEAT_GROUP_MSA_EXT_8,
 S390_FEAT_MULTIPLE_EPOCH,
 S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
+S390_FEAT_ZPCI_INTERP,
 };
 
 #define default_GEN14_GA2 EmptyFeat
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 6acf14d5ec..0357bfda89 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2294,6 +2294,7 @@ static int kvm_to_feat[][2] = {
 { KVM_S390_VM_CPU_FEAT_PFMFI, S390_FEAT_SIE_PFMFI},
 { KVM_S390_VM_CPU_FEAT_SIGPIF, S390_FEAT_SIE_SIGPIF},
 { KVM_S390_VM_CPU_FEAT_KSS, S390_FEAT_SIE_KSS},
+{ KVM_S390_VM_CPU_FEAT_ZPCI_INTERP, S390_FEAT_ZPCI_INTERP },
 };
 
 static int query_cpu_feat(S390FeatBitmap features)
-- 
2.27.0

[PATCH v5 2/9] vfio: tolerate migration protocol v1 uapi renames

2022-04-04 Thread Matthew Rosato

The v1 uapi is deprecated and will be replaced by v2 at some point;
this patch just tolerates the renaming of uapi fields to reflect
v1 / deprecated status.

Signed-off-by: Matthew Rosato 
---
 hw/vfio/common.c|  2 +-
 hw/vfio/migration.c | 19 +++
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 080046e3f5..7b1e12fb69 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -380,7 +380,7 @@ static bool 
vfio_devices_all_running_and_saving(VFIOContainer *container)
 return false;
 }
 
-if ((migration->device_state & VFIO_DEVICE_STATE_SAVING) &&
+if ((migration->device_state & VFIO_DEVICE_STATE_V1_SAVING) &&
 (migration->device_state & VFIO_DEVICE_STATE_RUNNING)) {
 continue;
 } else {
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index ff6b45de6b..e109cee551 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -432,7 +432,7 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
 }
 
 ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
-   VFIO_DEVICE_STATE_SAVING);
+   VFIO_DEVICE_STATE_V1_SAVING);
 if (ret) {
 error_report("%s: Failed to set state SAVING", vbasedev->name);
 return ret;
@@ -532,7 +532,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
 int ret;
 
 ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_RUNNING,
-   VFIO_DEVICE_STATE_SAVING);
+   VFIO_DEVICE_STATE_V1_SAVING);
 if (ret) {
 error_report("%s: Failed to set state STOP and SAVING",
  vbasedev->name);
@@ -569,7 +569,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
 return ret;
 }
 
-ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_SAVING, 0);
+ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_SAVING, 0);
 if (ret) {
 error_report("%s: Failed to set state STOPPED", vbasedev->name);
 return ret;
@@ -730,7 +730,7 @@ static void vfio_vmstate_change(void *opaque, bool running, 
RunState state)
  * start saving data.
  */
 if (state == RUN_STATE_SAVE_VM) {
-value = VFIO_DEVICE_STATE_SAVING;
+value = VFIO_DEVICE_STATE_V1_SAVING;
 } else {
 value = 0;
 }
@@ -768,8 +768,9 @@ static void vfio_migration_state_notifier(Notifier 
*notifier, void *data)
 case MIGRATION_STATUS_FAILED:
 bytes_transferred = 0;
 ret = vfio_migration_set_state(vbasedev,
-  ~(VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING),
-  VFIO_DEVICE_STATE_RUNNING);
+   ~(VFIO_DEVICE_STATE_V1_SAVING |
+ VFIO_DEVICE_STATE_RESUMING),
+   VFIO_DEVICE_STATE_RUNNING);
 if (ret) {
 error_report("%s: Failed to set state RUNNING", vbasedev->name);
 }
@@ -864,8 +865,10 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error 
**errp)
 goto add_blocker;
 }
 
-ret = vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_MIGRATION,
-   VFIO_REGION_SUBTYPE_MIGRATION, );
+ret = vfio_get_dev_region_info(vbasedev,
+   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
+   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
+   );
 if (ret) {
 goto add_blocker;
 }
-- 
2.27.0

[PATCH v5 4/9] s390x/pci: add routine to get host function handle from CLP info

2022-04-04 Thread Matthew Rosato

In order to interface with the underlying host zPCI device, we need
to know it's function handle.  Add a routine to grab this from the
vfio CLP capabilities chain.

Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-vfio.c | 83 ++--
 include/hw/s390x/s390-pci-vfio.h |  6 +++
 2 files changed, 73 insertions(+), 16 deletions(-)

diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 6f80a47e29..4bf0a7e22d 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -124,6 +124,27 @@ static void s390_pci_read_base(S390PCIBusDevice *pbdev,
 pbdev->zpci_fn.pft = 0;
 }
 
+static bool get_host_fh(S390PCIBusDevice *pbdev, struct vfio_device_info *info,
+uint32_t *fh)
+{
+struct vfio_info_cap_header *hdr;
+struct vfio_device_info_cap_zpci_base *cap;
+VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+
+hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_BASE);
+
+/* Can only get the host fh with version 2 or greater */
+if (hdr == NULL || hdr->version < 2) {
+trace_s390_pci_clp_cap(vpci->vbasedev.name,
+   VFIO_DEVICE_INFO_CAP_ZPCI_BASE);
+return false;
+}
+cap = (void *) hdr;
+
+*fh = cap->fh;
+return true;
+}
+
 static void s390_pci_read_group(S390PCIBusDevice *pbdev,
 struct vfio_device_info *info)
 {
@@ -217,25 +238,13 @@ static void s390_pci_read_pfip(S390PCIBusDevice *pbdev,
 memcpy(pbdev->zpci_fn.pfip, cap->pfip, CLP_PFIP_NR_SEGMENTS);
 }
 
-/*
- * This function will issue the VFIO_DEVICE_GET_INFO ioctl and look for
- * capabilities that contain information about CLP features provided by the
- * underlying host.
- * On entry, defaults have already been placed into the guest CLP response
- * buffers.  On exit, defaults will have been overwritten for any CLP features
- * found in the capability chain; defaults will remain for any CLP features not
- * found in the chain.
- */
-void s390_pci_get_clp_info(S390PCIBusDevice *pbdev)
+static struct vfio_device_info *get_device_info(S390PCIBusDevice *pbdev,
+uint32_t argsz)
 {
-g_autofree struct vfio_device_info *info = NULL;
+struct vfio_device_info *info = g_malloc0(argsz);
 VFIOPCIDevice *vfio_pci;
-uint32_t argsz;
 int fd;
 
-argsz = sizeof(*info);
-info = g_malloc0(argsz);
-
 vfio_pci = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
 fd = vfio_pci->vbasedev.fd;
 
@@ -250,7 +259,8 @@ retry:
 
 if (ioctl(fd, VFIO_DEVICE_GET_INFO, info)) {
 trace_s390_pci_clp_dev_info(vfio_pci->vbasedev.name);
-return;
+free(info);
+return NULL;
 }
 
 if (info->argsz > argsz) {
@@ -259,6 +269,47 @@ retry:
 goto retry;
 }
 
+return info;
+}
+
+/*
+ * Get the host function handle from the vfio CLP capabilities chain.  Returns
+ * true if a fh value was placed into the provided buffer.  Returns false
+ * if a fh could not be obtained (ioctl failed or capabilitiy version does
+ * not include the fh)
+ */
+bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh)
+{
+g_autofree struct vfio_device_info *info = NULL;
+
+assert(fh);
+
+info = get_device_info(pbdev, sizeof(*info));
+if (!info) {
+return false;
+}
+
+return get_host_fh(pbdev, info, fh);
+}
+
+/*
+ * This function will issue the VFIO_DEVICE_GET_INFO ioctl and look for
+ * capabilities that contain information about CLP features provided by the
+ * underlying host.
+ * On entry, defaults have already been placed into the guest CLP response
+ * buffers.  On exit, defaults will have been overwritten for any CLP features
+ * found in the capability chain; defaults will remain for any CLP features not
+ * found in the chain.
+ */
+void s390_pci_get_clp_info(S390PCIBusDevice *pbdev)
+{
+g_autofree struct vfio_device_info *info = NULL;
+
+info = get_device_info(pbdev, sizeof(*info));
+if (!info) {
+return;
+}
+
 /*
  * Find the CLP features provided and fill in the guest CLP responses.
  * Always call s390_pci_read_base first as information from this could
diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
index ff708aef50..0c2e4b5175 100644
--- a/include/hw/s390x/s390-pci-vfio.h
+++ b/include/hw/s390x/s390-pci-vfio.h
@@ -20,6 +20,7 @@ bool s390_pci_update_dma_avail(int fd, unsigned int *avail);
 S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
   S390PCIBusDevice *pbdev);
 void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt);
+bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh);
 void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
 #else
 static inline bool s390_pci_update_dma_avail(int fd, unsigned int *avail)
@@ -33,6 +34,11 @@ static inline S390PCIDMACount

[PATCH v5 1/9] Update linux headers

2022-04-04 Thread Matthew Rosato

This is a placeholder that pulls in 5.18-rc1 + unmerged kernel changes
required by this item.  A proper header sync can be done once the
associated kernel code merges.

Signed-off-by: Matthew Rosato 
---
 .../linux/input-event-codes.h |   4 +-
 .../standard-headers/linux/virtio_config.h|   6 +
 .../standard-headers/linux/virtio_crypto.h|  82 +++-
 linux-headers/asm-arm64/kvm.h |  16 +
 linux-headers/asm-generic/mman-common.h   |   2 +
 linux-headers/asm-mips/mman.h |   2 +
 linux-headers/asm-s390/kvm.h  |   1 +
 linux-headers/linux/kvm.h |  50 ++-
 linux-headers/linux/psci.h|   4 +
 linux-headers/linux/userfaultfd.h |   8 +-
 linux-headers/linux/vfio.h| 406 +-
 linux-headers/linux/vfio_zdev.h   |   7 +
 linux-headers/linux/vhost.h   |   7 +
 13 files changed, 376 insertions(+), 219 deletions(-)

diff --git a/include/standard-headers/linux/input-event-codes.h 
b/include/standard-headers/linux/input-event-codes.h
index b5e86b40ab..e36c01003a 100644
--- a/include/standard-headers/linux/input-event-codes.h
+++ b/include/standard-headers/linux/input-event-codes.h
@@ -278,7 +278,8 @@
 #define KEY_PAUSECD201
 #define KEY_PROG3  202
 #define KEY_PROG4  203
-#define KEY_DASHBOARD  204 /* AL Dashboard */
+#define KEY_ALL_APPLICATIONS   204 /* AC Desktop Show All Applications */
+#define KEY_DASHBOARD  KEY_ALL_APPLICATIONS
 #define KEY_SUSPEND205
 #define KEY_CLOSE  206 /* AC Close */
 #define KEY_PLAY   207
@@ -612,6 +613,7 @@
 #define KEY_ASSISTANT  0x247   /* AL Context-aware desktop assistant */
 #define KEY_KBD_LAYOUT_NEXT0x248   /* AC Next Keyboard Layout Select */
 #define KEY_EMOJI_PICKER   0x249   /* Show/hide emoji picker (HUTRR101) */
+#define KEY_DICTATE0x24a   /* Start or Stop Voice Dictation 
Session (HUTRR99) */
 
 #define KEY_BRIGHTNESS_MIN 0x250   /* Set Brightness to Minimum */
 #define KEY_BRIGHTNESS_MAX 0x251   /* Set Brightness to Maximum */
diff --git a/include/standard-headers/linux/virtio_config.h 
b/include/standard-headers/linux/virtio_config.h
index 22e3a85f67..7acd8d4abc 100644
--- a/include/standard-headers/linux/virtio_config.h
+++ b/include/standard-headers/linux/virtio_config.h
@@ -80,6 +80,12 @@
 /* This feature indicates support for the packed virtqueue layout. */
 #define VIRTIO_F_RING_PACKED   34
 
+/*
+ * Inorder feature indicates that all buffers are used by the device
+ * in the same order in which they have been made available.
+ */
+#define VIRTIO_F_IN_ORDER  35
+
 /*
  * This feature indicates that memory accesses by the driver and the
  * device are ordered in a way described by the platform.
diff --git a/include/standard-headers/linux/virtio_crypto.h 
b/include/standard-headers/linux/virtio_crypto.h
index 5ff0b4ee59..68066dafb6 100644
--- a/include/standard-headers/linux/virtio_crypto.h
+++ b/include/standard-headers/linux/virtio_crypto.h
@@ -37,6 +37,7 @@
 #define VIRTIO_CRYPTO_SERVICE_HASH   1
 #define VIRTIO_CRYPTO_SERVICE_MAC2
 #define VIRTIO_CRYPTO_SERVICE_AEAD   3
+#define VIRTIO_CRYPTO_SERVICE_AKCIPHER 4
 
 #define VIRTIO_CRYPTO_OPCODE(service, op)   (((service) << 8) | (op))
 
@@ -57,6 +58,10 @@ struct virtio_crypto_ctrl_header {
   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x02)
 #define VIRTIO_CRYPTO_AEAD_DESTROY_SESSION \
   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x03)
+#define VIRTIO_CRYPTO_AKCIPHER_CREATE_SESSION \
+  VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AKCIPHER, 0x04)
+#define VIRTIO_CRYPTO_AKCIPHER_DESTROY_SESSION \
+  VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AKCIPHER, 0x05)
uint32_t opcode;
uint32_t algo;
uint32_t flag;
@@ -180,6 +185,58 @@ struct virtio_crypto_aead_create_session_req {
uint8_t padding[32];
 };
 
+struct virtio_crypto_rsa_session_para {
+#define VIRTIO_CRYPTO_RSA_RAW_PADDING   0
+#define VIRTIO_CRYPTO_RSA_PKCS1_PADDING 1
+   uint32_t padding_algo;
+
+#define VIRTIO_CRYPTO_RSA_NO_HASH   0
+#define VIRTIO_CRYPTO_RSA_MD2   1
+#define VIRTIO_CRYPTO_RSA_MD3   2
+#define VIRTIO_CRYPTO_RSA_MD4   3
+#define VIRTIO_CRYPTO_RSA_MD5   4
+#define VIRTIO_CRYPTO_RSA_SHA1  5
+#define VIRTIO_CRYPTO_RSA_SHA2566
+#define VIRTIO_CRYPTO_RSA_SHA3847
+#define VIRTIO_CRYPTO_RSA_SHA5128
+#define VIRTIO_CRYPTO_RSA_SHA2249
+   uint32_t hash_algo;
+};
+
+struct virtio_crypto_ecdsa_session_para {
+#define VIRTIO_CRYPTO_CURVE_UNKNOWN   0
+#define VIRTIO_CRYPTO_CURVE_NIST_P192 1
+#define VIRTIO_CRYPTO_CURVE_NIST_P224 2
+#define VIRTIO_CRYPTO_CURVE_NIST_P256 3
+#define VIRTIO_CRYPTO_CURVE_NIST_P384 4
+#define VIRTIO_CRYPTO_CURVE_NIST_P521 5
+   uint32_t curve_id;
+

[PATCH v5 5/9] s390x/pci: enable for load/store intepretation

2022-04-04 Thread Matthew Rosato

If the appropriate CPU facilty is available as well as the necessary
ZPCI_OP ioctl, then the underlying KVM host will enable load/store
intepretation for any guest device without a SHM bit in the guest
function handle.  For a device that will be using interpretation
support, ensure the guest function handle matches the host function
handle; this value is re-checked every time the guest issues a SET PCI FN
to enable the guest device as it is the only opportunity to reflect
function handle changes.

By default, unless interpret=off is specified, interpretation support will
always be assumed and exploited if the necessary ioctl and features are
available on the host kernel.  When these are unavailable, we will silently
revert to the interception model; this allows existing guest configurations
to work unmodified on hosts with and without zPCI interpretation support,
allowing QEMU to choose the best support model available.

Signed-off-by: Matthew Rosato 
---
 hw/s390x/meson.build|  1 +
 hw/s390x/s390-pci-bus.c | 66 -
 hw/s390x/s390-pci-inst.c| 12 ++
 hw/s390x/s390-pci-kvm.c | 21 +++
 include/hw/s390x/s390-pci-bus.h |  1 +
 include/hw/s390x/s390-pci-kvm.h | 24 
 target/s390x/kvm/kvm.c  |  7 
 target/s390x/kvm/kvm_s390x.h|  1 +
 8 files changed, 132 insertions(+), 1 deletion(-)
 create mode 100644 hw/s390x/s390-pci-kvm.c
 create mode 100644 include/hw/s390x/s390-pci-kvm.h

diff --git a/hw/s390x/meson.build b/hw/s390x/meson.build
index 28484256ec..6e6e47fcda 100644
--- a/hw/s390x/meson.build
+++ b/hw/s390x/meson.build
@@ -23,6 +23,7 @@ s390x_ss.add(when: 'CONFIG_KVM', if_true: files(
   's390-skeys-kvm.c',
   's390-stattrib-kvm.c',
   'pv.c',
+  's390-pci-kvm.c',
 ))
 s390x_ss.add(when: 'CONFIG_TCG', if_true: files(
   'tod-tcg.c',
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 4b2bdd94b3..156051e6e9 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -16,6 +16,7 @@
 #include "qapi/visitor.h"
 #include "hw/s390x/s390-pci-bus.h"
 #include "hw/s390x/s390-pci-inst.h"
+#include "hw/s390x/s390-pci-kvm.h"
 #include "hw/s390x/s390-pci-vfio.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/qdev-properties.h"
@@ -971,12 +972,51 @@ static void s390_pci_update_subordinate(PCIDevice *dev, 
uint32_t nr)
 }
 }
 
+static int s390_pci_interp_plug(S390pciState *s, S390PCIBusDevice *pbdev)
+{
+uint32_t idx, fh;
+
+if (!s390_pci_get_host_fh(pbdev, )) {
+return -EPERM;
+}
+
+/*
+ * The host device is already in an enabled state, but we always present
+ * the initial device state to the guest as disabled (ZPCI_FS_DISABLED).
+ * Therefore, mask off the enable bit from the passthrough handle until
+ * the guest issues a CLP SET PCI FN later to enable the device.
+ */
+pbdev->fh = fh & ~FH_MASK_ENABLE;
+
+/* Next, see if the idx is already in-use */
+idx = pbdev->fh & FH_MASK_INDEX;
+if (pbdev->idx != idx) {
+if (s390_pci_find_dev_by_idx(s, idx)) {
+return -EINVAL;
+}
+/*
+ * Update the idx entry with the passed through idx
+ * If the relinquished idx is lower than next_idx, use it
+ * to replace next_idx
+ */
+g_hash_table_remove(s->zpci_table, >idx);
+if (idx < s->next_idx) {
+s->next_idx = idx;
+}
+pbdev->idx = idx;
+g_hash_table_insert(s->zpci_table, >idx, pbdev);
+}
+
+return 0;
+}
+
 static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
   Error **errp)
 {
 S390pciState *s = S390_PCI_HOST_BRIDGE(hotplug_dev);
 PCIDevice *pdev = NULL;
 S390PCIBusDevice *pbdev = NULL;
+int rc;
 
 if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
 PCIBridge *pb = PCI_BRIDGE(dev);
@@ -1022,12 +1062,35 @@ static void s390_pcihost_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 set_pbdev_info(pbdev);
 
 if (object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
-pbdev->fh |= FH_SHM_VFIO;
+/*
+ * By default, interpretation is always requested; if the available
+ * facilities indicate it is not available, fallback to the
+ * interception model.
+ */
+if (pbdev->interp) {
+if (s390_pci_kvm_interp_allowed()) {
+rc = s390_pci_interp_plug(s, pbdev);
+if (rc) {
+error_setg(errp, "Plug failed for zPCI device in "
+   "interpretation mode: %d", rc);
+return;
+}
+} else {
+DPRINTF("zPCI interpretation facilities missing.\n");
+pbdev->interp = false;
+}
+}
 pbdev->iommu->dma_limit =

[PATCH v5 0/9] s390x/pci: zPCI interpretation support

2022-04-04 Thread Matthew Rosato

For QEMU, the majority of the work in enabling instruction interpretation   
is handled via SHM bit settings (to indicate to firmware whether or not
interpretive execution facilities are to be used) + a new KVM ioctl is
used to setup firmware-interpreted forwarding of Adapter Event
Notifications.

This series also adds a new, optional 'interpret' parameter to zpci which   
can be used to disable interpretation support (interpret=off) as well as
an 'forwarding_assist' parameter to determine whether or not the firmware   
assist will be used for adapter event delivery (default when
interpretation is in use) or whether the host will be responsible for
delivering all adapter event notifications (forwarding_assist=off).

The ZPCI_INTERP CPU feature is added beginning with the z14 model to
enable this support.

As a consequence of implementing zPCI interpretation, ISM devices now   
become eligible for passthrough (but only when zPCI interpretation is   
available). 

>From the perspective of guest configuration, you passthrough zPCI devices  
> 
in the same manner as before, with intepretation support being used by  
default if available in kernel+qemu.

Will reply with a link to the associated kernel series. 
   
   
Changelog v4->v5:
- Update to match latest interface from kernel code.  Major changes are:
  1) we no longer issue any ioctls to set a device to interpreted mode;
  rather, this will be done automatically if supported by the host kernel
  at the time the vfio group is associated with the KVM.  Then, the SHM
  bit setting will indicate whether or not interpretation is actually
  used.
  2) the RPCIT enhancments (IOMMU changes) are removed from this series,
  so the code associated with indicating a desired IOMMU are also
  removed.  With this series s390x-pci will continue to use only type1
  IOMMU for now.
- Refresh the linux headers sync.  Added a patch to tolerate some vfio
  uapi renames that will happen in 5.18 (this can be discarded if there
  is something else underway to address this)

Matthew Rosato (9):
  Update linux headers
  vfio: tolerate migration protocol v1 uapi renames
  target/s390x: add zpci-interp to cpu models
  s390x/pci: add routine to get host function handle from CLP info
  s390x/pci: enable for load/store intepretation
  s390x/pci: don't fence interpreted devices without MSI-X
  s390x/pci: enable adapter event notification for interpreted devices
  s390x/pci: let intercept devices have separate PCI groups
  s390x/pci: reflect proper maxstbl for groups of interpreted devices

 hw/s390x/meson.build  |   1 +
 hw/s390x/s390-pci-bus.c   | 107 -
 hw/s390x/s390-pci-inst.c  |  52 ++-
 hw/s390x/s390-pci-kvm.c   |  51 +++
 hw/s390x/s390-pci-vfio.c  | 129 +-
 hw/s390x/s390-virtio-ccw.c|   1 +
 hw/vfio/common.c  |   2 +-
 hw/vfio/migration.c   |  19 +-
 include/hw/s390x/s390-pci-bus.h   |   8 +-
 include/hw/s390x/s390-pci-kvm.h   |  38 ++
 include/hw/s390x/s390-pci-vfio.h  |   6 +
 .../linux/input-event-codes.h |   4 +-
 .../standard-headers/linux/virtio_config.h|   6 +
 .../standard-headers/linux/virtio_crypto.h|  82 +++-
 linux-headers/asm-arm64/kvm.h |  16 +
 linux-headers/asm-generic/mman-common.h   |   2 +
 linux-headers/asm-mips/mman.h |   2 +
 linux-headers/asm-s390/kvm.h  |   1 +
 linux-headers/linux/kvm.h |  50 ++-
 linux-headers/linux/psci.h|   4 +
 linux-headers/linux/userfaultfd.h |   8 +-
 linux-headers/linux/vfio.h| 406 +-
 linux-headers/linux/vfio_zdev.h   |   7 +
 linux-headers/linux/vhost.h   |   7 +
 target/s390x/cpu_features_def.h.inc   |   1 +
 target/s390x/gen-features.c   |   2 +
 target/s390x/kvm/kvm.c|   8 +
 target/s390x/kvm/kvm_s390x.h  |   1 +
 28 files changed, 763 insertions(+), 258 deletions(-)
 create mode 100644 hw/s390x/s390-pci-kvm.c
 create mode 100644 include/hw/s390x/s390-pci-kvm.h

-- 
2.27.0

Re: [RFC PATCH] tests/qtest: attempt to enable tests for virtio-gpio (!working)

2022-04-04 Thread Dr. David Alan Gilbert

* Alex Bennée (alex.ben...@linaro.org) wrote:
> 
> (expanding the CC list for help, anyone have a better idea about how
> vhost-user qtests should work/see obvious issues with this patch?)

How exactly does it fail?

DAve

> Alex Bennée  writes:
> 
> > We don't have a virtio-gpio implementation in QEMU and only
> > support a vhost-user backend. The QEMU side of the code is minimal so
> > it should be enough to instantiate the device and pass some vhost-user
> > messages over the control socket. To do this we hook into the existing
> > vhost-user-test code and just add the bits required for gpio.
> >
> > Based-on: 20220118203833.316741-1-eric.au...@redhat.com
> > Signed-off-by: Alex Bennée 
> > Cc: Viresh Kumar 
> > Cc: Paolo Bonzini 
> >
> > ---
> >
> > This goes as far as to add things to the QOS tree but so far it's
> > failing to properly start QEMU with the chardev socket needed to
> > communicate between the mock vhost-user daemon and QEMU itself.
> > ---
> >  tests/qtest/libqos/virtio-gpio.h | 34 +++
> >  tests/qtest/libqos/virtio-gpio.c | 98 
> >  tests/qtest/vhost-user-test.c| 34 +++
> >  tests/qtest/libqos/meson.build   |  1 +
> >  4 files changed, 167 insertions(+)
> >  create mode 100644 tests/qtest/libqos/virtio-gpio.h
> >  create mode 100644 tests/qtest/libqos/virtio-gpio.c
> >
> > diff --git a/tests/qtest/libqos/virtio-gpio.h 
> > b/tests/qtest/libqos/virtio-gpio.h
> > new file mode 100644
> > index 00..abe6967ae9
> > --- /dev/null
> > +++ b/tests/qtest/libqos/virtio-gpio.h
> > @@ -0,0 +1,34 @@
> > +/*
> > + * virtio-gpio structures
> > + *
> > + * Copyright (c) 2022 Linaro Ltd
> > + *
> > + * SPDX-License-Identifier: GPL-2.0-or-later
> > + */
> > +
> > +#ifndef TESTS_LIBQOS_VIRTIO_GPIO_H
> > +#define TESTS_LIBQOS_VIRTIO_GPIO_H
> > +
> > +#include "qgraph.h"
> > +#include "virtio.h"
> > +#include "virtio-pci.h"
> > +
> > +typedef struct QVhostUserGPIO QVhostUserGPIO;
> > +typedef struct QVhostUserGPIOPCI QVhostUserGPIOPCI;
> > +typedef struct QVhostUserGPIODevice QVhostUserGPIODevice;
> > +
> > +struct QVhostUserGPIO {
> > +QVirtioDevice *vdev;
> > +};
> > +
> > +struct QVhostUserGPIOPCI {
> > +QVirtioPCIDevice pci_vdev;
> > +QVhostUserGPIO gpio;
> > +};
> > +
> > +struct QVhostUserGPIODevice {
> > +QOSGraphObject obj;
> > +QVhostUserGPIO gpio;
> > +};
> > +
> > +#endif
> > diff --git a/tests/qtest/libqos/virtio-gpio.c 
> > b/tests/qtest/libqos/virtio-gpio.c
> > new file mode 100644
> > index 00..62c8074777
> > --- /dev/null
> > +++ b/tests/qtest/libqos/virtio-gpio.c
> > @@ -0,0 +1,98 @@
> > +/*
> > + * virtio-gpio nodes for testing
> > + *
> > + * Copyright (c) 2022 Linaro Ltd
> > + *
> > + * SPDX-License-Identifier: GPL-2.0-or-later
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "libqtest.h"
> > +#include "qemu/module.h"
> > +#include "qgraph.h"
> > +#include "virtio-gpio.h"
> > +
> > +static void *qvirtio_gpio_get_driver(QVhostUserGPIO *v_gpio,
> > +   const char *interface)
> > +{
> > +if (!g_strcmp0(interface, "vhost-user-gpio")) {
> > +return v_gpio;
> > +}
> > +if (!g_strcmp0(interface, "virtio")) {
> > +return v_gpio->vdev;
> > +}
> > +
> > +fprintf(stderr, "%s not present in virtio-gpio-device\n", interface);
> > +g_assert_not_reached();
> > +}
> > +
> > +static void *qvirtio_gpio_device_get_driver(void *object,
> > +  const char *interface)
> > +{
> > +QVhostUserGPIODevice *v_gpio = object;
> > +return qvirtio_gpio_get_driver(_gpio->gpio, interface);
> > +}
> > +
> > +static void *virtio_gpio_device_create(void *virtio_dev,
> > + QGuestAllocator *t_alloc,
> > + void *addr)
> > +{
> > +QVhostUserGPIODevice *virtio_device = g_new0(QVhostUserGPIODevice, 1);
> > +QVhostUserGPIO *interface = _device->gpio;
> > +
> > +interface->vdev = virtio_dev;
> > +
> > +virtio_device->obj.get_driver = qvirtio_gpio_device_get_driver;
> > +
> > +return _device->obj;
> > +}
> > +
> > +/* virtio-gpio-pci */
> > +static void *qvirtio_gpio_pci_get_driver(void *object, const char 
> > *interface)
> > +{
> > +QVhostUserGPIOPCI *v_gpio = object;
> > +if (!g_strcmp0(interface, "pci-device")) {
> > +return v_gpio->pci_vdev.pdev;
> > +}
> > +return qvirtio_gpio_get_driver(_gpio->gpio, interface);
> > +}
> > +
> > +static void *virtio_gpio_pci_create(void *pci_bus, QGuestAllocator 
> > *t_alloc,
> > +  void *addr)
> > +{
> > +QVhostUserGPIOPCI *virtio_spci = g_new0(QVhostUserGPIOPCI, 1);
> > +QVhostUserGPIO *interface = _spci->gpio;
> > +QOSGraphObject *obj = _spci->pci_vdev.obj;
> > +
> > +virtio_pci_init(_spci->pci_vdev, pci_bus, addr);
> > +interface->vdev = _spci->pci_vdev.vdev;
> > +
> > +obj->get_driver

Re: [PATCH] tests/qtest: failover: fix infinite loop

2022-04-04 Thread Dr. David Alan Gilbert

* Laurent Vivier (lviv...@redhat.com) wrote:
> If the migration is over before we cancel it, we are
> waiting in a loop a state that never comes because the state
> is already "completed".
> 
> To avoid an infinite loop, skip the test if the migration
> is "completed" before we were able to cancel it.
> 
> Signed-off-by: Laurent Vivier 

If you're finding it's skipping to often, you might try setting the
migration bandwidth really low right at the start (a few bytes/second)
to ensure it doesn't complete under your feet.

Dave

> ---
>  tests/qtest/virtio-net-failover.c | 29 +
>  1 file changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/qtest/virtio-net-failover.c 
> b/tests/qtest/virtio-net-failover.c
> index 80292eecf65f..78811f1c9216 100644
> --- a/tests/qtest/virtio-net-failover.c
> +++ b/tests/qtest/virtio-net-failover.c
> @@ -1141,6 +1141,11 @@ static void test_migrate_guest_off_abort(gconstpointer 
> opaque)
>  ret = migrate_status(qts);
>  
>  status = qdict_get_str(ret, "status");
> +if (strcmp(status, "completed") == 0) {
> +g_test_skip("Failed to cancel the migration");
> +qobject_unref(ret);
> +goto out;
> +}
>  if (strcmp(status, "active") == 0) {
>  qobject_unref(ret);
>  break;
> @@ -1155,8 +1160,12 @@ static void test_migrate_guest_off_abort(gconstpointer 
> opaque)
>  
>  while (true) {
>  ret = migrate_status(qts);
> -
>  status = qdict_get_str(ret, "status");
> +if (strcmp(status, "completed") == 0) {
> +g_test_skip("Failed to cancel the migration");
> +qobject_unref(ret);
> +goto out;
> +}
>  if (strcmp(status, "cancelled") == 0) {
>  qobject_unref(ret);
>  break;
> @@ -1169,6 +1178,7 @@ static void test_migrate_guest_off_abort(gconstpointer 
> opaque)
>  check_one_card(qts, true, "standby0", MAC_STANDBY0);
>  check_one_card(qts, false, "primary0", MAC_PRIMARY0);
>  
> +out:
>  qos_object_destroy((QOSGraphObject *)vdev);
>  machine_stop(qts);
>  }
> @@ -1251,8 +1261,7 @@ static void 
> test_migrate_abort_wait_unplug(gconstpointer opaque)
>  qobject_unref(ret);
>  break;
>  }
> -g_assert_cmpstr(status, !=, "failed");
> -g_assert_cmpstr(status, !=, "active");
> +g_assert_cmpstr(status, ==, "cancelling");
>  qobject_unref(ret);
>  }
>  
> @@ -1324,11 +1333,11 @@ static void test_migrate_abort_active(gconstpointer 
> opaque)
>  ret = migrate_status(qts);
>  
>  status = qdict_get_str(ret, "status");
> +g_assert_cmpstr(status, !=, "failed");
>  if (strcmp(status, "wait-unplug") != 0) {
>  qobject_unref(ret);
>  break;
>  }
> -g_assert_cmpstr(status, !=, "failed");
>  qobject_unref(ret);
>  }
>  
> @@ -1340,6 +1349,11 @@ static void test_migrate_abort_active(gconstpointer 
> opaque)
>  ret = migrate_status(qts);
>  
>  status = qdict_get_str(ret, "status");
> +if (strcmp(status, "completed") == 0) {
> +g_test_skip("Failed to cancel the migration");
> +qobject_unref(ret);
> +goto out;
> +}
>  if (strcmp(status, "cancelled") == 0) {
>  qobject_unref(ret);
>  break;
> @@ -1352,6 +1366,7 @@ static void test_migrate_abort_active(gconstpointer 
> opaque)
>  check_one_card(qts, true, "standby0", MAC_STANDBY0);
>  check_one_card(qts, true, "primary0", MAC_PRIMARY0);
>  
> +out:
>  qos_object_destroy((QOSGraphObject *)vdev);
>  machine_stop(qts);
>  }
> @@ -1425,6 +1440,11 @@ static void test_migrate_off_abort(gconstpointer 
> opaque)
>  ret = migrate_status(qts);
>  
>  status = qdict_get_str(ret, "status");
> +if (strcmp(status, "completed") == 0) {
> +g_test_skip("Failed to cancel the migration");
> +qobject_unref(ret);
> +goto out;
> +}
>  if (strcmp(status, "cancelled") == 0) {
>  qobject_unref(ret);
>  break;
> @@ -1437,6 +1457,7 @@ static void test_migrate_off_abort(gconstpointer opaque)
>  check_one_card(qts, true, "standby0", MAC_STANDBY0);
>  check_one_card(qts, true, "primary0", MAC_PRIMARY0);
>  
> +out:
>  qos_object_destroy((QOSGraphObject *)vdev);
>  machine_stop(qts);
>  }
> -- 
> 2.35.1
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[PATCH v2] target/riscv: Fix incorrect PTE merge in walk_pte

2022-04-04 Thread Ralf Ramsauer

Two non-subsequent PTEs can be mapped to subsequent paddrs. In this
case, walk_pte will erroneously merge them.

Enforce the split up, by tracking the virtual base address.

Let's say we have the mapping:
0x8120 -> 0x89623000 (4K)
0x8120f000 -> 0x89624000 (4K)

Before, walk_pte would have shown:

vaddrpaddrsize attr
   ---
8120 89623000 2000 rwxu-ad

as it only checks for subsequent paddrs. With this patch, it becomes:

vaddrpaddrsize attr
   ---
8120 89623000 1000 rwxu-ad
8120f000 89624000 1000 rwxu-ad

Signed-off-by: Ralf Ramsauer 
---
 target/riscv/monitor.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/riscv/monitor.c b/target/riscv/monitor.c
index 7efb4b62c1..9dc4cb1156 100644
--- a/target/riscv/monitor.c
+++ b/target/riscv/monitor.c
@@ -84,6 +84,7 @@ static void walk_pte(Monitor *mon, hwaddr base, target_ulong 
start,
 {
 hwaddr pte_addr;
 hwaddr paddr;
+target_ulong last_start = -1;
 target_ulong pgsize;
 target_ulong pte;
 int ptshift;
@@ -116,7 +117,8 @@ static void walk_pte(Monitor *mon, hwaddr base, 
target_ulong start,
  * contiguous mapped block details.
  */
 if ((*last_attr != attr) ||
-(*last_paddr + *last_size != paddr)) {
+(*last_paddr + *last_size != paddr) ||
+(last_start + *last_size != start)) {
 print_pte(mon, va_bits, *vbase, *pbase,
   *last_paddr + *last_size - *pbase, *last_attr);
 
@@ -125,6 +127,7 @@ static void walk_pte(Monitor *mon, hwaddr base, 
target_ulong start,
 *last_attr = attr;
 }
 
+last_start = start;
 *last_paddr = paddr;
 *last_size = pgsize;
 } else {
-- 
2.35.1

Re: [PATCH] target/riscv: Fix incorrect PTE merge in walk_pte

2022-04-04 Thread Ralf Ramsauer





On 01/04/2022 14:22, Ralf Ramsauer wrote:

Two non-subsequent PTEs can be mapped to subsequent paddrs. In this
case, walk_pte will erroneously merge them.

Enforce the split up, by tracking the virtual base address.

Let's say we have the mapping:
0x8120 -> 0x89623000 (4K)
0x8120f000 -> 0x89624000 (4K)

Before, walk_pte would have shown:

vaddrpaddrsize attr
   ---
8120 89623000 2000 rwxu-ad

as it only checks for subsequent paddrs. With this patch, it becomes:

vaddrpaddrsize attr
   ---
8120 89623000 1000 rwxu-ad
8120f000 89624000 1000 rwxu-ad

Signed-off-by: Ralf Ramsauer 
---
  target/riscv/monitor.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/riscv/monitor.c b/target/riscv/monitor.c
index 7efb4b62c1..60e3edd0ad 100644
--- a/target/riscv/monitor.c
+++ b/target/riscv/monitor.c
@@ -84,6 +84,7 @@ static void walk_pte(Monitor *mon, hwaddr base, target_ulong 
start,
  {
  hwaddr pte_addr;
  hwaddr paddr;
+target_ulong last_start = -1;
  target_ulong pgsize;
  target_ulong pte;
  int ptshift;
@@ -116,13 +117,15 @@ static void walk_pte(Monitor *mon, hwaddr base, 
target_ulong start,
   * contiguous mapped block details.
   */
  if ((*last_attr != attr) ||
-(*last_paddr + *last_size != paddr)) {
+(*last_paddr + *last_size != paddr) ||
+(last_start + *last_size != start)) {
  print_pte(mon, va_bits, *vbase, *pbase,
*last_paddr + *last_size - *pbase, *last_attr);
  
  *vbase = start;

  *pbase = paddr;
  *last_attr = attr;
+last_start = start;
  }


Yikes, there's a small bug in my patch that I failed to see:
last_addr = start should be outside the curly brackets, otherwise it 
will rip up too much regions.


I'll return with a V2.

Thanks
  Ralf

  
  *last_paddr = paddr;

Re: [PATCH] multifd: Copy pages before compressing them with zlib

2022-04-04 Thread Dr. David Alan Gilbert

* Ilya Leoshkevich (i...@linux.ibm.com) wrote:
> On Mon, 2022-04-04 at 12:20 +0100, Dr. David Alan Gilbert wrote:
> > * Ilya Leoshkevich (i...@linux.ibm.com) wrote:
> > > zlib_send_prepare() compresses pages of a running VM. zlib does not
> > > make any thread-safety guarantees with respect to changing
> > > deflate()
> > > input concurrently with deflate() [1].
> > > 
> > > One can observe problems due to this with the IBM zEnterprise Data
> > > Compression accelerator capable zlib [2]. When the hardware
> > > acceleration is enabled, migration/multifd/tcp/zlib test fails
> > > intermittently [3] due to sliding window corruption.
> > > 
> > > At the moment this problem occurs only with this accelerator, since
> > > its architecture explicitly discourages concurrent accesses [4]:
> > > 
> > >     Page 26-57, "Other Conditions":
> > > 
> > >     As observed by this CPU, other CPUs, and channel
> > >     programs, references to the parameter block, first,
> > >     second, and third operands may be multiple-access
> > >     references, accesses to these storage locations are
> > >     not necessarily block-concurrent, and the sequence
> > >     of these accesses or references is undefined.
> > > 
> > > Still, it might affect other platforms due to a future zlib update.
> > > Therefore, copy the page being compressed into a private buffer
> > > before
> > > passing it to zlib.
> > 
> > While this might work around the problem; your explanation doesn't
> > quite
> > fit with the symptoms; or if they do, then you have a separate
> > problem.
> > 
> > The live migration code relies on the fact that the source is running
> > and changing it's memory as the data is transmitted; however it also
> > relies on the fact that if this happens the 'dirty' flag is set
> > _after_
> > those changes causing another round of migration and retransmission
> > of
> > the (now stable) data.
> > 
> > We don't expect the load of the data for the first page write to be
> > correct, consistent etc - we just rely on the retransmission to be
> > correct when the page is stable.
> > 
> > If your compressor hardware is doing something undefined during the
> > first case that's fine; as long as it works fine in the stable case
> > where the data isn't changing.
> > 
> > Adding the extra copy is going to slow everyone else dowmn; and since
> > there's plenty of pthread lockingin those multifd I'm expecting them
> > to get reasonably defined ordering and thus be safe from multi
> > threading
> > problems (please correct us if we've actually done something wrong in
> > the locking there).
> > 
> > IMHO your accelerator when called from a zlib call needs to behave
> > the same as if it was the software implementation; i.e. if we've got
> > pthread calls in there that are enforcing ordering then that should
> > be
> > fine; your accelerator implementation needs to add a barrier of some
> > type or an internal copy, not penalise everyone else.
> > 
> > Dave
> 
> The problem with the accelerator is that during the first case the
> internal state might end up being corrupted (in particular: what goes
> into the deflate stream differs from what goes into the sliding
> window). This may affect the data integrity in the second case later
> on.

Hmm I hadn't expected the unpredictability to span multiple blocks.

> I've been trying to think what to do with that, and of course doing an
> internal copy is one option (a barrier won't suffice). However, I
> realized that zlib API as documented doesn't guarantee that it's safe
> to change input data concurrently with compression. On the other hand,
> today's zlib is implemented in a way that tolerates this.
> 
> So the open question for me is, whether we should honor zlib
> documentation (in which case, I would argue, QEMU needs to be changed)
> or say that the behavior of today's zlib implementation is more
> important (in which case accelerator code needs to change). I went with
> the former for now, but the latter is of course doable as well.

Well I think you're saying that the current docs don't specify and
thus assume that there's a constraint.

I think the right people to answer this is the zlib community; so
can you send a mail to zlib-devel and ask?

Dave

-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH for-7.1] hw/arm/virt: Check for attempt to use TrustZone with KVM or HVF

2022-04-04 Thread Richard Henderson


On 4/4/22 10:53, Peter Maydell wrote:

It's not possible to provide the guest with the Security extensions
(TrustZone) when using KVM or HVF, because the hardware
virtualization extensions don't permit running EL3 guest code.
However, we weren't checking for this combination, with the result
that QEMU would assert if you tried it:

$ qemu-system-aarch64 -enable-kvm -machine virt,secure=on -cpu host -display 
none
Unexpected error in object_property_find_err() at ../../qom/object.c:1304:
qemu-system-aarch64: Property 'host-arm-cpu.secure-memory' not found
Aborted

Check for this combination of options and report an error, in the
same way we already do for attempts to give a KVM or HVF guest the
Virtualization or MTE extensions. Now we will report:

qemu-system-aarch64: mach-virt: KVM does not support providing Security 
extensions (TrustZone) to the guest CPU

Signed-off-by: Peter Maydell
---
Not a regression, so not worth fixing in 7.0.
---
  hw/arm/virt.c | 7 +++
  1 file changed, 7 insertions(+)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-04-04 Thread Sean Christopherson

On Mon, Apr 04, 2022, Quentin Perret wrote:
> On Friday 01 Apr 2022 at 12:56:50 (-0700), Andy Lutomirski wrote:
> FWIW, there are a couple of reasons why I'd like to have in-place
> conversions:
> 
>  - one goal of pKVM is to migrate some things away from the Arm
>Trustzone environment (e.g. DRM and the likes) and into protected VMs
>instead. This will give Linux a fighting chance to defend itself
>against these things -- they currently have access to _all_ memory.
>And transitioning pages between Linux and Trustzone (donations and
>shares) is fast and non-destructive, so we really do not want pKVM to
>regress by requiring the hypervisor to memcpy things;

Is there actually a _need_ for the conversion to be non-destructive?  E.g. I 
assume
the "trusted" side of things will need to be reworked to run as a pKVM guest, at
which point reworking its logic to understand that conversions are destructive 
and
slow-ish doesn't seem too onerous.

>  - it can be very useful for protected VMs to do shared=>private
>conversions. Think of a VM receiving some data from the host in a
>shared buffer, and then it wants to operate on that buffer without
>risking to leak confidential informations in a transient state. In
>that case the most logical thing to do is to convert the buffer back
>to private, do whatever needs to be done on that buffer (decrypting a
>frame, ...), and then share it back with the host to consume it;

If performance is a motivation, why would the guest want to do two conversions
instead of just doing internal memcpy() to/from a private page?  I would be 
quite
surprised if multiple exits and TLB shootdowns is actually faster, especially at
any kind of scale where zapping stage-2 PTEs will cause lock contention and 
IPIs.

>  - similar to the previous point, a protected VM might want to
>temporarily turn a buffer private to avoid ToCToU issues;

Again, bounce buffer the page in the guest.

>  - once we're able to do device assignment to protected VMs, this might
>allow DMA-ing to a private buffer, and make it shared later w/o
>bouncing.

Exposing a private buffer to a device doesn't requring in-place conversion.  The
proper way to handle this would be to teach e.g. VFIO to retrieve the PFN from
the backing store.  I don't understand the use case for sharing a DMA'd page at 
a
later time; with whom would the guest share the page?  E.g. if a NIC has access 
to
guest private data then there should never be a need to convert/bounce the page.

Re: [PATCH v1 8/9] qom: add command to print initial properties

2022-04-04 Thread Maxim Davydov




On 3/31/22 14:55, Igor Mammedov wrote:

On Tue, 29 Mar 2022 00:15:38 +0300
Maxim Davydov  wrote:


The command "query-init-properties" is needed to get values of properties
after initialization (not only default value). It makes sense, for example,
when working with x86_64-cpu.
All machine types (and x-remote-object, because its init uses machime
type's infrastructure) should be skipped, because only the one instance can
be correctly initialized.

It might be obvious to you but I couldn't parse above commit message at all.
Pls rephrase and explain in more detail what you are trying to achieve.
I want to dump all "default" object properties to compare it with 
compat_props of MachineState. It means that I need values from 
ObjectProperty even it doesn't have default value. For many devices it 
can give useless information. But, for example, x86_64-cpu sets "real" 
default values for specific model only during initialization. 
x86_cpu_properties[] can't give information about kvm default features.



Signed-off-by: Maxim Davydov 
---
  qapi/qom.json  |  69 ++
  qom/qom-qmp-cmds.c | 121 +
  2 files changed, 190 insertions(+)

diff --git a/qapi/qom.json b/qapi/qom.json
index eeb5395ff3..1eedc441eb 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -949,3 +949,72 @@
  ##
  { 'command': 'object-del', 'data': {'id': 'str'},
'allow-preconfig': true }
+
+##
+# @InitValue:
+#
+# Not all objects have default values but they have "initial" values.
+#
+# @name: property name
+#
+# @value: Current value (default or after initialization. It makes sence,
+# for example, for x86-cpus)
+#
+# Since: 7.0
+#
+##
+{ 'struct': 'InitValue',
+  'data': { 'name': 'str',
+'*value': 'any' } }
+
+##
+# @ClassProperties:
+#
+# Initial values of properties that are owned by the class
+#
+# @classname: name of the class that owns appropriate properties
+#
+# @classprops: List of class properties
+#
+# Since: 7.0
+#
+##
+{ 'struct': 'ClassProperties',
+  'data': { 'classname': 'str',
+'*classprops': [ 'InitValue' ] } }
+
+##
+# @InitProps:
+#
+# List of properties and their values that are available after class
+# initialization. So it important to know default value of the property
+# even if it doesn't have "QObject *defval"
+#
+# @name: Object name
+#
+# @props: List of properties
+#
+# Notes: a value in each property was defval if it's available
+#otherwise it's obtained via "(ObjectPropertyAccessor*) get"
+#immediately after initialization of device object.
+#
+# Since: 7.0
+#
+##
+{ 'struct': 'InitProps',
+  'data': { 'name': 'str',
+'props': [ 'ClassProperties' ] } }
+
+##
+# @query-init-properties:
+#
+# Returns list of all objects (except all types related with machine type)
+# with all properties and their "default" values that  will be available
+# after initialization. The main purpose of this command is to be used to
+# build table with all machine-type-specific properties
+#
+# Since: 7.0
+#
+##
+{ 'command': 'query-init-properties',
+  'returns': [ 'InitProps' ] }
diff --git a/qom/qom-qmp-cmds.c b/qom/qom-qmp-cmds.c
index 2d6f41ecc7..c1bb3f1f8b 100644
--- a/qom/qom-qmp-cmds.c
+++ b/qom/qom-qmp-cmds.c
@@ -27,6 +27,7 @@
  #include "qemu/cutils.h"
  #include "qom/object_interfaces.h"
  #include "qom/qom-qobject.h"
+#include "hw/boards.h"
  
  ObjectPropertyInfoList *qmp_qom_list(const char *path, Error **errp)

  {
@@ -235,3 +236,123 @@ void qmp_object_del(const char *id, Error **errp)
  {
  user_creatable_del(id, errp);
  }
+
+static void query_object_prop(InitValueList **props_list, ObjectProperty *prop,
+  Object *obj, Error **errp)
+{
+InitValue *prop_info = NULL;
+
+/* Skip inconsiderable properties */
+if (strcmp(prop->name, "type") == 0 ||
+strcmp(prop->name, "realized") == 0 ||
+strcmp(prop->name, "hotpluggable") == 0 ||
+strcmp(prop->name, "hotplugged") == 0 ||
+strcmp(prop->name, "parent_bus") == 0) {
+return;
+}
+
+prop_info = g_malloc0(sizeof(*prop_info));
+prop_info->name = g_strdup(prop->name);
+prop_info->value = NULL;
+if (prop->defval) {
+prop_info->value = qobject_ref(prop->defval);
+} else if (prop->get) {
+/*
+ * crash-information in x86-cpu uses errp to return current state.
+ * So, after requesting this property it returns  GenericError:
+ * "No crash occured"
+ */
+if (strcmp(prop->name, "crash-information") != 0) {
+prop_info->value = object_property_get_qobject(obj, prop->name,
+   errp);
+}
+}
+prop_info->has_value = !!prop_info->value;
+
+QAPI_LIST_PREPEND(*props_list, prop_info);
+}
+
+typedef struct QIPData {
+InitPropsList **dev_list;
+Error **errp;
+} QIPData;
+
+static void

[PATCH for-7.1 08/18] hw/arm/exynos4210: Put external GIC into state struct

2022-04-04 Thread Peter Maydell

Switch the creation of the external GIC to the new-style "embedded in
state struct" approach, so we can easily refer to the object
elsewhere during realize.

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h  |  2 ++
 include/hw/intc/exynos4210_gic.h | 43 
 hw/arm/exynos4210.c  | 10 
 hw/intc/exynos4210_gic.c | 17 ++---
 MAINTAINERS  |  2 +-
 5 files changed, 53 insertions(+), 21 deletions(-)
 create mode 100644 include/hw/intc/exynos4210_gic.h

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index d83e96a091e..f35ae9f 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -27,6 +27,7 @@
 #include "hw/or-irq.h"
 #include "hw/sysbus.h"
 #include "hw/cpu/a9mpcore.h"
+#include "hw/intc/exynos4210_gic.h"
 #include "target/arm/cpu-qom.h"
 #include "qom/object.h"
 
@@ -103,6 +104,7 @@ struct Exynos4210State {
 qemu_or_irq pl330_irq_orgate[EXYNOS4210_NUM_DMA];
 qemu_or_irq cpu_irq_orgate[EXYNOS4210_NCPUS];
 A9MPPrivState a9mpcore;
+Exynos4210GicState ext_gic;
 };
 
 #define TYPE_EXYNOS4210_SOC "exynos4210"
diff --git a/include/hw/intc/exynos4210_gic.h b/include/hw/intc/exynos4210_gic.h
new file mode 100644
index 000..f64c4069c6d
--- /dev/null
+++ b/include/hw/intc/exynos4210_gic.h
@@ -0,0 +1,43 @@
+/*
+ * Samsung exynos4210 GIC implementation. Based on hw/arm_gic.c
+ *
+ * Copyright (c) 2000 - 2011 Samsung Electronics Co., Ltd.
+ * All rights reserved.
+ *
+ * Evgeny Voevodin 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+#ifndef HW_INTC_EXYNOS4210_GIC_H
+#define HW_INTC_EXYNOS4210_GIC_H
+
+#include "hw/sysbus.h"
+
+#define TYPE_EXYNOS4210_GIC "exynos4210.gic"
+OBJECT_DECLARE_SIMPLE_TYPE(Exynos4210GicState, EXYNOS4210_GIC)
+
+#define EXYNOS4210_GIC_NCPUS 2
+
+struct Exynos4210GicState {
+SysBusDevice parent_obj;
+
+MemoryRegion cpu_container;
+MemoryRegion dist_container;
+MemoryRegion cpu_alias[EXYNOS4210_GIC_NCPUS];
+MemoryRegion dist_alias[EXYNOS4210_GIC_NCPUS];
+uint32_t num_cpu;
+DeviceState *gic;
+};
+
+#endif
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 742666ba779..2058df9aecf 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -455,10 +455,9 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 sysbus_create_simple("l2x0", EXYNOS4210_L2X0_BASE_ADDR, NULL);
 
 /* External GIC */
-dev = qdev_new("exynos4210.gic");
-qdev_prop_set_uint32(dev, "num-cpu", EXYNOS4210_NCPUS);
-busdev = SYS_BUS_DEVICE(dev);
-sysbus_realize_and_unref(busdev, _fatal);
+qdev_prop_set_uint32(DEVICE(>ext_gic), "num-cpu", EXYNOS4210_NCPUS);
+busdev = SYS_BUS_DEVICE(>ext_gic);
+sysbus_realize(busdev, _fatal);
 /* Map CPU interface */
 sysbus_mmio_map(busdev, 0, EXYNOS4210_EXT_GIC_CPU_BASE_ADDR);
 /* Map Distributer interface */
@@ -468,7 +467,7 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
qdev_get_gpio_in(DEVICE(>cpu_irq_orgate[n]), 1));
 }
 for (n = 0; n < EXYNOS4210_EXT_GIC_NIRQ; n++) {
-s->irqs.ext_gic_irq[n] = qdev_get_gpio_in(dev, n);
+s->irqs.ext_gic_irq[n] = qdev_get_gpio_in(DEVICE(>ext_gic), n);
 }
 
 /* Internal Interrupt Combiner */
@@ -686,6 +685,7 @@ static void exynos4210_init(Object *obj)
 }
 
 object_initialize_child(obj, "a9mpcore", >a9mpcore, TYPE_A9MPCORE_PRIV);
+object_initialize_child(obj, "ext-gic", >ext_gic, TYPE_EXYNOS4210_GIC);
 }
 
 static void exynos4210_class_init(ObjectClass *klass, void *data)
diff --git a/hw/intc/exynos4210_gic.c b/hw/intc/exynos4210_gic.c
index d8cad537fbf..71a88c86bc1 100644
--- a/hw/intc/exynos4210_gic.c
+++ b/hw/intc/exynos4210_gic.c
@@ -27,6 +27,7 @@
 #include "qemu/module.h"
 #include "hw/irq.h"
 #include "hw/qdev-properties.h"
+#include "hw/intc/exynos4210_gic.h"
 #include "hw/arm/exynos4210.h"
 #include "qom/object.h"
 
@@ -44,20 +45,6 @@
 #define EXYNOS4210_GIC_CPU_REGION_SIZE  0x100
 #define EXYNOS4210_GIC_DIST_REGION_SIZE 0x1000
 
-#define TYPE_EXYNOS4210_GIC "exynos4210.gic"
-OBJECT_DECLARE_SIMPLE_TYPE(Exynos4210GicState, EXYNOS4210_GIC)
-
-struct Exynos4210GicState {
-SysBusDevice parent_obj;
-
-MemoryRegion cpu_container;
-MemoryRegion dist_container;
-MemoryRegion

[PATCH for-7.1 17/18] hw/arm/exynos4210: Put combiners into state struct

2022-04-04 Thread Peter Maydell

Switch the creation of the combiner devices to the new-style
"embedded in state struct" approach, so we can easily refer
to the object elsewhere during realize.

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h   |  3 ++
 include/hw/intc/exynos4210_combiner.h | 57 +++
 hw/arm/exynos4210.c   | 20 +-
 hw/intc/exynos4210_combiner.c | 31 +--
 4 files changed, 72 insertions(+), 39 deletions(-)
 create mode 100644 include/hw/intc/exynos4210_combiner.h

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index f24617f681d..d38be8767b3 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -28,6 +28,7 @@
 #include "hw/sysbus.h"
 #include "hw/cpu/a9mpcore.h"
 #include "hw/intc/exynos4210_gic.h"
+#include "hw/intc/exynos4210_combiner.h"
 #include "hw/core/split-irq.h"
 #include "target/arm/cpu-qom.h"
 #include "qom/object.h"
@@ -105,6 +106,8 @@ struct Exynos4210State {
 qemu_or_irq cpu_irq_orgate[EXYNOS4210_NCPUS];
 A9MPPrivState a9mpcore;
 Exynos4210GicState ext_gic;
+Exynos4210CombinerState int_combiner;
+Exynos4210CombinerState ext_combiner;
 SplitIRQ splitter[EXYNOS4210_NUM_SPLITTERS];
 };
 
diff --git a/include/hw/intc/exynos4210_combiner.h 
b/include/hw/intc/exynos4210_combiner.h
new file mode 100644
index 000..429844fed41
--- /dev/null
+++ b/include/hw/intc/exynos4210_combiner.h
@@ -0,0 +1,57 @@
+/*
+ * Samsung exynos4210 Interrupt Combiner
+ *
+ * Copyright (c) 2000 - 2011 Samsung Electronics Co., Ltd.
+ * All rights reserved.
+ *
+ * Evgeny Voevodin 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#ifndef HW_INTC_EXYNOS4210_COMBINER
+#define HW_INTC_EXYNOS4210_COMBINER
+
+#include "hw/sysbus.h"
+
+/*
+ * State for each output signal of internal combiner
+ */
+typedef struct CombinerGroupState {
+uint8_t src_mask;/* 1 - source enabled, 0 - disabled */
+uint8_t src_pending;/* Pending source interrupts before masking */
+} CombinerGroupState;
+
+#define TYPE_EXYNOS4210_COMBINER "exynos4210.combiner"
+OBJECT_DECLARE_SIMPLE_TYPE(Exynos4210CombinerState, EXYNOS4210_COMBINER)
+
+/* Number of groups and total number of interrupts for the internal combiner */
+#define IIC_NGRP 64
+#define IIC_NIRQ (IIC_NGRP * 8)
+#define IIC_REGSET_SIZE 0x41
+
+struct Exynos4210CombinerState {
+SysBusDevice parent_obj;
+
+MemoryRegion iomem;
+
+struct CombinerGroupState group[IIC_NGRP];
+uint32_t reg_set[IIC_REGSET_SIZE];
+uint32_t icipsr[2];
+uint32_t external;  /* 1 means that this combiner is external */
+
+qemu_irq output_irq[IIC_NGRP];
+};
+
+#endif
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 05b28cf5905..27c6ab27123 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -624,25 +624,23 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 }
 
 /* Internal Interrupt Combiner */
-dev = qdev_new("exynos4210.combiner");
-busdev = SYS_BUS_DEVICE(dev);
-sysbus_realize_and_unref(busdev, _fatal);
+busdev = SYS_BUS_DEVICE(>int_combiner);
+sysbus_realize(busdev, _fatal);
 for (n = 0; n < EXYNOS4210_MAX_INT_COMBINER_OUT_IRQ; n++) {
 sysbus_connect_irq(busdev, n,
qdev_get_gpio_in(DEVICE(>a9mpcore), n));
 }
-exynos4210_combiner_get_gpioin(>irqs, dev, 0);
+exynos4210_combiner_get_gpioin(>irqs, DEVICE(>int_combiner), 0);
 sysbus_mmio_map(busdev, 0, EXYNOS4210_INT_COMBINER_BASE_ADDR);
 
 /* External Interrupt Combiner */
-dev = qdev_new("exynos4210.combiner");
-qdev_prop_set_uint32(dev, "external", 1);
-busdev = SYS_BUS_DEVICE(dev);
-sysbus_realize_and_unref(busdev, _fatal);
+qdev_prop_set_uint32(DEVICE(>ext_combiner), "external", 1);
+busdev = SYS_BUS_DEVICE(>ext_combiner);
+sysbus_realize(busdev, _fatal);
 for (n = 0; n < EXYNOS4210_MAX_INT_COMBINER_OUT_IRQ; n++) {
 sysbus_connect_irq(busdev, n, qdev_get_gpio_in(DEVICE(>ext_gic), 
n));
 }
-exynos4210_combiner_get_gpioin(>irqs, dev, 1);
+exynos4210_combiner_get_gpioin(>irqs, DEVICE(>ext_combiner), 1);
 sysbus_mmio_map(busdev, 0, EXYNOS4210_EXT_COMBINER_BASE_ADDR);
 
 /* Initialize board IRQs. */
@@ -844,6 +842,10 @@ static void exynos4210_init(Object *obj)

[PATCH for-7.1 06/18] hw/arm/exynos4210: Fix code style nit in combiner_grp_to_gic_id[]

2022-04-04 Thread Peter Maydell

Fix a missing set of spaces around '-' in the definition of
combiner_grp_to_gic_id[]. We're about to move this code, so
fix the style issue first to keep checkpatch happy with the
code-motion patch.

Signed-off-by: Peter Maydell 
---
 hw/intc/exynos4210_gic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/intc/exynos4210_gic.c b/hw/intc/exynos4210_gic.c
index ec79b96f6d1..3b77a485780 100644
--- a/hw/intc/exynos4210_gic.c
+++ b/hw/intc/exynos4210_gic.c
@@ -121,7 +121,7 @@ enum ExtInt {
  */
 
 static const uint32_t
-combiner_grp_to_gic_id[64-EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
+combiner_grp_to_gic_id[64 - EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
 /* int combiner groups 16-19 */
 { }, { }, { }, { },
 /* int combiner group 20 */
-- 
2.25.1

[PATCH for-7.1 18/18] hw/arm/exynos4210: Drop Exynos4210Irq struct

2022-04-04 Thread Peter Maydell

The only time we use the int_combiner_irq[] and ext_combiner_irq[]
arrays in the Exynos4210Irq struct is during realize of the SoC -- we
initialize them with the input IRQs of the combiner devices, and then
connect those to outputs of other devices in
exynos4210_init_board_irqs().  Now that the combiner objects are
easily accessible as s->int_combiner and s->ext_combiner we can make
the connections directly from one device to the other without going
via these arrays.

Since these are the only two remaining elements of Exynos4210Irq,
we can remove that struct entirely.

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h |  6 --
 hw/arm/exynos4210.c | 34 --
 2 files changed, 8 insertions(+), 32 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index d38be8767b3..97353f1c02f 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -82,17 +82,11 @@
  */
 #define EXYNOS4210_NUM_SPLITTERS (EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ + 38)
 
-typedef struct Exynos4210Irq {
-qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
-qemu_irq ext_combiner_irq[EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ];
-} Exynos4210Irq;
-
 struct Exynos4210State {
 /*< private >*/
 SysBusDevice parent_obj;
 /*< public >*/
 ARMCPU *cpu[EXYNOS4210_NCPUS];
-Exynos4210Irq irqs;
 qemu_irq irq_table[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
 
 MemoryRegion chipid_mem;
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 27c6ab27123..8dafa2215b6 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -331,8 +331,9 @@ static int mapline_size(const int *mapline)
 static void exynos4210_init_board_irqs(Exynos4210State *s)
 {
 uint32_t grp, bit, irq_id, n;
-Exynos4210Irq *is = >irqs;
 DeviceState *extgicdev = DEVICE(>ext_gic);
+DeviceState *intcdev = DEVICE(>int_combiner);
+DeviceState *extcdev = DEVICE(>ext_combiner);
 int splitcount = 0;
 DeviceState *splitter;
 const int *mapline;
@@ -375,8 +376,10 @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
 splitin = 0;
 for (;;) {
 s->irq_table[in] = qdev_get_gpio_in(splitter, 0);
-qdev_connect_gpio_out(splitter, splitin, is->int_combiner_irq[in]);
-qdev_connect_gpio_out(splitter, splitin + 1, 
is->ext_combiner_irq[in]);
+qdev_connect_gpio_out(splitter, splitin,
+  qdev_get_gpio_in(intcdev, in));
+qdev_connect_gpio_out(splitter, splitin + 1,
+  qdev_get_gpio_in(extcdev, in));
 splitin += 2;
 if (!mapline) {
 break;
@@ -414,11 +417,11 @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
 qdev_realize(splitter, NULL, _abort);
 splitcount++;
 s->irq_table[n] = qdev_get_gpio_in(splitter, 0);
-qdev_connect_gpio_out(splitter, 0, is->int_combiner_irq[n]);
+qdev_connect_gpio_out(splitter, 0, qdev_get_gpio_in(intcdev, n));
 qdev_connect_gpio_out(splitter, 1,
   qdev_get_gpio_in(extgicdev, irq_id - 32));
 } else {
-s->irq_table[n] = is->int_combiner_irq[n];
+s->irq_table[n] = qdev_get_gpio_in(intcdev, n);
 }
 }
 /*
@@ -440,25 +443,6 @@ uint32_t exynos4210_get_irq(uint32_t grp, uint32_t bit)
 return EXYNOS4210_COMBINER_GET_IRQ_NUM(grp, bit);
 }
 
-/*
- * Get Combiner input GPIO into irqs structure
- */
-static void exynos4210_combiner_get_gpioin(Exynos4210Irq *irqs,
-   DeviceState *dev, int ext)
-{
-int n;
-int max;
-qemu_irq *irq;
-
-max = ext ? EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ :
-EXYNOS4210_MAX_INT_COMBINER_IN_IRQ;
-irq = ext ? irqs->ext_combiner_irq : irqs->int_combiner_irq;
-
-for (n = 0; n < max; n++) {
-irq[n] = qdev_get_gpio_in(dev, n);
-}
-}
-
 static uint8_t chipid_and_omr[] = { 0x11, 0x02, 0x21, 0x43,
 0x09, 0x00, 0x00, 0x00 };
 
@@ -630,7 +614,6 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 sysbus_connect_irq(busdev, n,
qdev_get_gpio_in(DEVICE(>a9mpcore), n));
 }
-exynos4210_combiner_get_gpioin(>irqs, DEVICE(>int_combiner), 0);
 sysbus_mmio_map(busdev, 0, EXYNOS4210_INT_COMBINER_BASE_ADDR);
 
 /* External Interrupt Combiner */
@@ -640,7 +623,6 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 for (n = 0; n < EXYNOS4210_MAX_INT_COMBINER_OUT_IRQ; n++) {
 sysbus_connect_irq(busdev, n, qdev_get_gpio_in(DEVICE(>ext_gic), 
n));
 }
-exynos4210_combiner_get_gpioin(>irqs, DEVICE(>ext_combiner), 1);
 sysbus_mmio_map(busdev, 0, EXYNOS4210_EXT_COMBINER_BASE_ADDR);
 
 /* Initialize board IRQs. */
-- 
2.25.1

[PATCH for-7.1 15/18] hw/arm/exynos4210: Don't connect multiple lines to external GIC inputs

2022-04-04 Thread Peter Maydell

The combiner_grp_to_gic_id[] array includes the EXT_GIC_ID_MCT_G0
and EXT_GIC_ID_MCT_G1 multiple times. This means that we will
connect multiple IRQs up to the same external GIC input, which
is not permitted. We do the same thing in the code in
exynos4210_init_board_irqs() because the conditionals selecting
an irq_id in the first loop match multiple interrupt IDs.

Overall we do this for interrupt IDs
(1, 4), (12, 4), (35, 4), (51, 4), (53, 4) for EXT_GIC_ID_MCT_G0
and
(1, 5), (12, 5), (35, 5), (51, 5), (53, 5) for EXT_GIC_ID_MCT_G1

These correspond to the cases for the multi-core timer that we are
wiring up to multiple inputs on the combiner in
exynos4210_combiner_get_gpioin().  That code already deals with all
these interrupt IDs being the same input source, so we don't need to
connect the external GIC interrupt for any of them except the first
(1, 4) and (1, 5). Remove the array entries and conditionals which
were incorrectly causing us to wire up extra lines.

This bug didn't cause any visible effects, because we only connect
up a device to the "primary" ID values (1, 4) and (1, 5), so the
extra lines would never be set to a level.

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h |  2 +-
 hw/arm/exynos4210.c | 12 +---
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index f58ee0f2686..7da3eddea5f 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -77,7 +77,7 @@
  * one for every non-zero entry in combiner_grp_to_gic_id[].
  * We'll assert in exynos4210_init_board_irqs() if this is wrong.
  */
-#define EXYNOS4210_NUM_SPLITTERS (EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ + 60)
+#define EXYNOS4210_NUM_SPLITTERS (EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ + 54)
 
 typedef struct Exynos4210Irq {
 qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 962d6d0ac2a..39e334e0773 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -231,7 +231,7 @@ combiner_grp_to_gic_id[64 - 
EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
 /* int combiner group 34 */
 { EXT_GIC_ID_ONENAND_AUDI, EXT_GIC_ID_NFC },
 /* int combiner group 35 */
-{ 0, 0, 0, EXT_GIC_ID_MCT_L1, EXT_GIC_ID_MCT_G0, EXT_GIC_ID_MCT_G1 },
+{ 0, 0, 0, EXT_GIC_ID_MCT_L1 },
 /* int combiner group 36 */
 { EXT_GIC_ID_MIXER },
 /* int combiner group 37 */
@@ -240,11 +240,11 @@ combiner_grp_to_gic_id[64 - 
EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
 /* groups 38-50 */
 { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { },
 /* int combiner group 51 */
-{ EXT_GIC_ID_MCT_L0, 0, 0, 0, EXT_GIC_ID_MCT_G0, EXT_GIC_ID_MCT_G1 },
+{ EXT_GIC_ID_MCT_L0 },
 /* group 52 */
 { },
 /* int combiner group 53 */
-{ EXT_GIC_ID_WDT, 0, 0, 0, EXT_GIC_ID_MCT_G0, EXT_GIC_ID_MCT_G1 },
+{ EXT_GIC_ID_WDT },
 /* groups 54-63 */
 { }, { }, { }, { }, { }, { }, { }, { }, { }, { }
 };
@@ -268,13 +268,11 @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
 
 for (n = 0; n < EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ; n++) {
 irq_id = 0;
-if (n == EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 4) ||
-n == EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 4)) {
+if (n == EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 4)) {
 /* MCT_G0 is passed to External GIC */
 irq_id = EXT_GIC_ID_MCT_G0;
 }
-if (n == EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 5) ||
-n == EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 5)) {
+if (n == EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 5)) {
 /* MCT_G1 is passed to External and GIC */
 irq_id = EXT_GIC_ID_MCT_G1;
 }
-- 
2.25.1

[PATCH for-7.1 09/18] hw/arm/exynos4210: Drop ext_gic_irq[] from Exynos4210Irq struct

2022-04-04 Thread Peter Maydell

The only time we use the ext_gic_irq[] array in the Exynos4210Irq
struct is during realize of the SoC -- we initialize it with the
input IRQs of the external GIC device, and then connect those to
outputs of other devices further on in realize (including in the
exynos4210_init_board_irqs() function).  Now that the ext_gic object
is easily accessible as s->ext_gic we can make the connections
directly from one device to the other without going via this array.

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h |  1 -
 hw/arm/exynos4210.c | 12 ++--
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index f35ae9f..08f52c511ff 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -83,7 +83,6 @@
 typedef struct Exynos4210Irq {
 qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
 qemu_irq ext_combiner_irq[EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ];
-qemu_irq ext_gic_irq[EXYNOS4210_EXT_GIC_NIRQ];
 } Exynos4210Irq;
 
 struct Exynos4210State {
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 2058df9aecf..5a41af089f9 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -257,6 +257,7 @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
 {
 uint32_t grp, bit, irq_id, n;
 Exynos4210Irq *is = >irqs;
+DeviceState *extgicdev = DEVICE(>ext_gic);
 
 for (n = 0; n < EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ; n++) {
 irq_id = 0;
@@ -272,7 +273,8 @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
 }
 if (irq_id) {
 s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
-is->ext_gic_irq[irq_id - 32]);
+ qdev_get_gpio_in(extgicdev,
+  irq_id - 32));
 } else {
 s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
 is->ext_combiner_irq[n]);
@@ -287,7 +289,8 @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
 
 if (irq_id) {
 s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
-is->ext_gic_irq[irq_id - 32]);
+ qdev_get_gpio_in(extgicdev,
+  irq_id - 32));
 }
 }
 }
@@ -466,9 +469,6 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 sysbus_connect_irq(busdev, n,
qdev_get_gpio_in(DEVICE(>cpu_irq_orgate[n]), 1));
 }
-for (n = 0; n < EXYNOS4210_EXT_GIC_NIRQ; n++) {
-s->irqs.ext_gic_irq[n] = qdev_get_gpio_in(DEVICE(>ext_gic), n);
-}
 
 /* Internal Interrupt Combiner */
 dev = qdev_new("exynos4210.combiner");
@@ -487,7 +487,7 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 busdev = SYS_BUS_DEVICE(dev);
 sysbus_realize_and_unref(busdev, _fatal);
 for (n = 0; n < EXYNOS4210_MAX_INT_COMBINER_OUT_IRQ; n++) {
-sysbus_connect_irq(busdev, n, s->irqs.ext_gic_irq[n]);
+sysbus_connect_irq(busdev, n, qdev_get_gpio_in(DEVICE(>ext_gic), 
n));
 }
 exynos4210_combiner_get_gpioin(>irqs, dev, 1);
 sysbus_mmio_map(busdev, 0, EXYNOS4210_EXT_COMBINER_BASE_ADDR);
-- 
2.25.1

[PATCH v9 41/45] qtest/cxl: Add aarch64 virt test for CXL

2022-04-04 Thread Jonathan Cameron via

Add a single complex case for aarch64 virt machine.

Signed-off-by: Jonathan Cameron 
---
 tests/qtest/cxl-test.c  | 48 +
 tests/qtest/meson.build |  1 +
 2 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
index 079011af6a..ac7d71fd74 100644
--- a/tests/qtest/cxl-test.c
+++ b/tests/qtest/cxl-test.c
@@ -17,6 +17,11 @@
   "-device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=53 " \
   "-cxl-fixed-memory-window 
targets.0=cxl.0,targets.1=cxl.1,size=4G "
 
+#define QEMU_VIRT_2PXB_CMD "-machine virt,cxl=on "  \
+  "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 "  \
+  "-device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=53 "  \
+  "-cxl-fixed-memory-window 
targets.0=cxl.0,targets.1=cxl.1,size=4G "
+
 #define QEMU_RP "-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 "
 
 /* Dual ports on first pxb */
@@ -134,18 +139,43 @@ static void cxl_2pxb_4rp_4t3d(void)
 qtest_end();
 }
 
+static void cxl_virt_2pxb_4rp_4t3d(void)
+{
+g_autoptr(GString) cmdline = g_string_new(NULL);
+char template[] = "/tmp/cxl-test-XX";
+const char *tmpfs;
+
+tmpfs = mkdtemp(template);
+
+g_string_printf(cmdline, QEMU_VIRT_2PXB_CMD QEMU_4RP QEMU_4T3D,
+tmpfs, tmpfs, tmpfs, tmpfs, tmpfs, tmpfs,
+tmpfs, tmpfs);
+
+qtest_start(cmdline->str);
+qtest_end();
+}
+
 int main(int argc, char **argv)
 {
+const char *arch = qtest_get_arch();
+
 g_test_init(, , NULL);
 
-qtest_add_func("/pci/cxl/basic_hostbridge", cxl_basic_hb);
-qtest_add_func("/pci/cxl/basic_pxb", cxl_basic_pxb);
-qtest_add_func("/pci/cxl/pxb_with_window", cxl_pxb_with_window);
-qtest_add_func("/pci/cxl/pxb_x2_with_window", cxl_2pxb_with_window);
-qtest_add_func("/pci/cxl/rp", cxl_root_port);
-qtest_add_func("/pci/cxl/rp_x2", cxl_2root_port);
-qtest_add_func("/pci/cxl/type3_device", cxl_t3d);
-qtest_add_func("/pci/cxl/rp_x2_type3_x2", cxl_1pxb_2rp_2t3d);
-qtest_add_func("/pci/cxl/pxb_x2_root_port_x4_type3_x4", cxl_2pxb_4rp_4t3d);
+if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
+qtest_add_func("/pci/cxl/basic_hostbridge", cxl_basic_hb);
+qtest_add_func("/pci/cxl/basic_pxb", cxl_basic_pxb);
+qtest_add_func("/pci/cxl/pxb_with_window", cxl_pxb_with_window);
+qtest_add_func("/pci/cxl/pxb_x2_with_window", cxl_2pxb_with_window);
+qtest_add_func("/pci/cxl/rp", cxl_root_port);
+qtest_add_func("/pci/cxl/rp_x2", cxl_2root_port);
+qtest_add_func("/pci/cxl/type3_device", cxl_t3d);
+qtest_add_func("/pci/cxl/rp_x2_type3_x2", cxl_1pxb_2rp_2t3d);
+qtest_add_func("/pci/cxl/pxb_x2_root_port_x4_type3_x4",
+   cxl_2pxb_4rp_4t3d);
+} else if (strcmp(arch, "aarch64") == 0) {
+qtest_add_func("/pci/cxl/virt/pxb_x2_root_port_x4_type3_x4",
+   cxl_virt_2pxb_4rp_4t3d);
+}
+
 return g_test_run();
 }
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 6e1ad4dc9a..6aefde4584 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -224,6 +224,7 @@ qtests_aarch64 = \
   (config_all_devices.has_key('CONFIG_TPM_TIS_SYSBUS') ? 
['tpm-tis-device-test'] : []) +\
   (config_all_devices.has_key('CONFIG_TPM_TIS_SYSBUS') ? 
['tpm-tis-device-swtpm-test'] : []) +  \
   (config_all_devices.has_key('CONFIG_XLNX_ZYNQMP_ARM') ? ['xlnx-can-test', 
'fuzz-xlnx-dp-test'] : []) + \
+  qtests_cxl + 
 \
   ['arm-cpu-features',
'numa-test',
'boot-serial-test',
-- 
2.32.0

[PATCH for-7.1 10/18] hw/arm/exynos4210: Move exynos4210_combiner_get_gpioin() into exynos4210.c

2022-04-04 Thread Peter Maydell

The function exynos4210_combiner_get_gpioin() currently lives in
exynos4210_combiner.c, but it isn't really part of the combiner
device itself -- it is a function that implements the wiring up of
some interrupt sources to multiple combiner inputs.  Move it to live
with the other SoC-level code in exynos4210.c, along with a few
macros previously defined in exynos4210.h which are now used only
in exynos4210.c.

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h   | 11 -
 hw/arm/exynos4210.c   | 82 +++
 hw/intc/exynos4210_combiner.c | 77 
 3 files changed, 82 insertions(+), 88 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index 08f52c511ff..b564e3582bb 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -67,11 +67,6 @@
 #define EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ   \
 (EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ * 8)
 
-#define EXYNOS4210_COMBINER_GET_IRQ_NUM(grp, bit)  ((grp)*8 + (bit))
-#define EXYNOS4210_COMBINER_GET_GRP_NUM(irq)   ((irq) / 8)
-#define EXYNOS4210_COMBINER_GET_BIT_NUM(irq) \
-((irq) - 8 * EXYNOS4210_COMBINER_GET_GRP_NUM(irq))
-
 /* IRQs number for external and internal GIC */
 #define EXYNOS4210_EXT_GIC_NIRQ (160-32)
 #define EXYNOS4210_INT_GIC_NIRQ 64
@@ -118,12 +113,6 @@ void exynos4210_write_secondary(ARMCPU *cpu,
  *  bit - bit number inside group */
 uint32_t exynos4210_get_irq(uint32_t grp, uint32_t bit);
 
-/*
- * Get Combiner input GPIO into irqs structure
- */
-void exynos4210_combiner_get_gpioin(Exynos4210Irq *irqs, DeviceState *dev,
-int ext);
-
 /*
  * exynos4210 UART
  */
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 5a41af089f9..86a9a0dae12 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -249,6 +249,11 @@ combiner_grp_to_gic_id[64 - 
EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
 { }, { }, { }, { }, { }, { }, { }, { }, { }, { }
 };
 
+#define EXYNOS4210_COMBINER_GET_IRQ_NUM(grp, bit)  ((grp) * 8 + (bit))
+#define EXYNOS4210_COMBINER_GET_GRP_NUM(irq)   ((irq) / 8)
+#define EXYNOS4210_COMBINER_GET_BIT_NUM(irq) \
+((irq) - 8 * EXYNOS4210_COMBINER_GET_GRP_NUM(irq))
+
 /*
  * Initialize board IRQs.
  * These IRQs contain splitted Int/External Combiner and External Gic IRQs.
@@ -306,6 +311,83 @@ uint32_t exynos4210_get_irq(uint32_t grp, uint32_t bit)
 return EXYNOS4210_COMBINER_GET_IRQ_NUM(grp, bit);
 }
 
+/*
+ * Get Combiner input GPIO into irqs structure
+ */
+static void exynos4210_combiner_get_gpioin(Exynos4210Irq *irqs,
+   DeviceState *dev, int ext)
+{
+int n;
+int bit;
+int max;
+qemu_irq *irq;
+
+max = ext ? EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ :
+EXYNOS4210_MAX_INT_COMBINER_IN_IRQ;
+irq = ext ? irqs->ext_combiner_irq : irqs->int_combiner_irq;
+
+/*
+ * Some IRQs of Int/External Combiner are going to two Combiners groups,
+ * so let split them.
+ */
+for (n = 0; n < max; n++) {
+
+bit = EXYNOS4210_COMBINER_GET_BIT_NUM(n);
+
+switch (n) {
+/* MDNIE_LCD1 INTG1 */
+case EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 0) ...
+ EXYNOS4210_COMBINER_GET_IRQ_NUM(1, 3):
+irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
+irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(0, bit + 4)]);
+continue;
+
+/* TMU INTG3 */
+case EXYNOS4210_COMBINER_GET_IRQ_NUM(3, 4):
+irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
+irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(2, bit)]);
+continue;
+
+/* LCD1 INTG12 */
+case EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 0) ...
+ EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 3):
+irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
+irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(11, bit + 4)]);
+continue;
+
+/* Multi-Core Timer INTG12 */
+case EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 4) ...
+ EXYNOS4210_COMBINER_GET_IRQ_NUM(12, 8):
+   irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
+   irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]);
+continue;
+
+/* Multi-Core Timer INTG35 */
+case EXYNOS4210_COMBINER_GET_IRQ_NUM(35, 4) ...
+ EXYNOS4210_COMBINER_GET_IRQ_NUM(35, 8):
+irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
+irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]);
+continue;
+
+/* Multi-Core Timer INTG51 */
+case EXYNOS4210_COMBINER_GET_IRQ_NUM(51, 4) ...
+ EXYNOS4210_COMBINER_GET_IRQ_NUM(51, 8):
+irq[n] = qemu_irq_split(qdev_get_gpio_in(dev, n),
+irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]);
+continue;
+
+/* Multi-Core Timer INTG53 */
+case

[PATCH for-7.1] hw/arm/virt: Check for attempt to use TrustZone with KVM or HVF

2022-04-04 Thread Peter Maydell

It's not possible to provide the guest with the Security extensions
(TrustZone) when using KVM or HVF, because the hardware
virtualization extensions don't permit running EL3 guest code.
However, we weren't checking for this combination, with the result
that QEMU would assert if you tried it:

$ qemu-system-aarch64 -enable-kvm -machine virt,secure=on -cpu host -display 
none
Unexpected error in object_property_find_err() at ../../qom/object.c:1304:
qemu-system-aarch64: Property 'host-arm-cpu.secure-memory' not found
Aborted

Check for this combination of options and report an error, in the
same way we already do for attempts to give a KVM or HVF guest the
Virtualization or MTE extensions. Now we will report:

qemu-system-aarch64: mach-virt: KVM does not support providing Security 
extensions (TrustZone) to the guest CPU

Signed-off-by: Peter Maydell 
---
Not a regression, so not worth fixing in 7.0.
---
 hw/arm/virt.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d2e5ecd234a..8f94e2fde62 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2048,6 +2048,13 @@ static void machvirt_init(MachineState *machine)
 exit(1);
 }
 
+if (vms->secure && (kvm_enabled() || hvf_enabled())) {
+error_report("mach-virt: %s does not support providing "
+ "Security extensions (TrustZone) to the guest CPU",
+ kvm_enabled() ? "KVM" : "HVF");
+exit(1);
+}
+
 if (vms->virt && (kvm_enabled() || hvf_enabled())) {
 error_report("mach-virt: %s does not support providing "
  "Virtualization extensions to the guest CPU",
-- 
2.25.1

[PATCH for-7.1 14/18] hw/arm/exynos4210: Connect MCT_G0 and MCT_G1 to both combiners

2022-04-04 Thread Peter Maydell

Currently for the interrupts MCT_G0 and MCT_G1 which are
the only ones in the input range of the external combiner
and which are also wired to the external GIC, we connect
them only to the internal combiner and the external GIC.
This seems likely to be a bug, as all other interrupts
which are in the input range of both combiners are
connected to both combiners. (The fact that the code in
exynos4210_combiner_get_gpioin() is also trying to wire
up these inputs on both combiners also suggests this.)

Wire these interrupts up to both combiners, like the rest.

Signed-off-by: Peter Maydell 
---
 hw/arm/exynos4210.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index a4527f819ef..962d6d0ac2a 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -281,16 +281,15 @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
 
 assert(splitcount < EXYNOS4210_NUM_SPLITTERS);
 splitter = DEVICE(>splitter[splitcount]);
-qdev_prop_set_uint16(splitter, "num-lines", 2);
+qdev_prop_set_uint16(splitter, "num-lines", irq_id ? 3 : 2);
 qdev_realize(splitter, NULL, _abort);
 splitcount++;
 s->irq_table[n] = qdev_get_gpio_in(splitter, 0);
 qdev_connect_gpio_out(splitter, 0, is->int_combiner_irq[n]);
+qdev_connect_gpio_out(splitter, 1, is->ext_combiner_irq[n]);
 if (irq_id) {
-qdev_connect_gpio_out(splitter, 1,
+qdev_connect_gpio_out(splitter, 2,
   qdev_get_gpio_in(extgicdev, irq_id - 32));
-} else {
-qdev_connect_gpio_out(splitter, 1, is->ext_combiner_irq[n]);
 }
 }
 for (; n < EXYNOS4210_MAX_INT_COMBINER_IN_IRQ; n++) {
-- 
2.25.1

[PATCH v9 37/45] qtests/bios-tables-test: Add a test for CXL emulation.

2022-04-04 Thread Jonathan Cameron via

The DSDT includes several CXL specific elements and the CEDT
table is only present if we enable CXL.

The test exercises all current functionality with several
CFMWS, CHBS structures in CEDT and ACPI0016/ACPI00017 and _OSC
entries in DSDT.

Signed-off-by: Jonathan Cameron 
---
 tests/qtest/bios-tables-test.c | 44 ++
 1 file changed, 44 insertions(+)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index c4a2d1e166..577b26bcec 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -1537,6 +1537,49 @@ static void test_acpi_q35_viot(void)
 free_test_data();
 }
 
+static void test_acpi_q35_cxl(void)
+{
+gchar *tmp_path = g_dir_make_tmp("qemu-test-cxl.XX", NULL);
+gchar *params;
+
+test_data data = {
+.machine = MACHINE_Q35,
+.variant = ".cxl",
+};
+/*
+ * A complex CXL setup.
+ */
+params = g_strdup_printf(" -machine cxl=on"
+ " -object 
memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M"
+ " -object 
memory-backend-file,id=cxl-mem2,mem-path=%s,size=256M"
+ " -object 
memory-backend-file,id=cxl-mem3,mem-path=%s,size=256M"
+ " -object 
memory-backend-file,id=cxl-mem4,mem-path=%s,size=256M"
+ " -object 
memory-backend-file,id=lsa1,mem-path=%s,size=256M"
+ " -object 
memory-backend-file,id=lsa2,mem-path=%s,size=256M"
+ " -object 
memory-backend-file,id=lsa3,mem-path=%s,size=256M"
+ " -object 
memory-backend-file,id=lsa4,mem-path=%s,size=256M"
+ " -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1"
+ " -device pxb-cxl,bus_nr=222,bus=pcie.0,id=cxl.2"
+ " -device 
cxl-rp,port=0,bus=cxl.1,id=rp1,chassis=0,slot=2"
+ " -device 
cxl-type3,bus=rp1,memdev=cxl-mem1,lsa=lsa1"
+ " -device 
cxl-rp,port=1,bus=cxl.1,id=rp2,chassis=0,slot=3"
+ " -device 
cxl-type3,bus=rp2,memdev=cxl-mem2,lsa=lsa2"
+ " -device 
cxl-rp,port=0,bus=cxl.2,id=rp3,chassis=0,slot=5"
+ " -device 
cxl-type3,bus=rp3,memdev=cxl-mem3,lsa=lsa3"
+ " -device 
cxl-rp,port=1,bus=cxl.2,id=rp4,chassis=0,slot=6"
+ " -device 
cxl-type3,bus=rp4,memdev=cxl-mem4,lsa=lsa4"
+ " -cxl-fixed-memory-window 
targets.0=cxl.1,size=4G,interleave-granularity=8k"
+ " -cxl-fixed-memory-window 
targets.0=cxl.1,targets.1=cxl.2,size=4G,interleave-granularity=8k",
+ tmp_path, tmp_path, tmp_path, tmp_path,
+ tmp_path, tmp_path, tmp_path, tmp_path);
+test_acpi_one(params, );
+
+g_free(params);
+g_assert(g_rmdir(tmp_path) == 0);
+g_free(tmp_path);
+free_test_data();
+}
+
 static void test_acpi_virt_viot(void)
 {
 test_data data = {
@@ -1742,6 +1785,7 @@ int main(int argc, char *argv[])
 qtest_add_func("acpi/q35/kvm/dmar", test_acpi_q35_kvm_dmar);
 }
 qtest_add_func("acpi/q35/viot", test_acpi_q35_viot);
+qtest_add_func("acpi/q35/cxl", test_acpi_q35_cxl);
 qtest_add_func("acpi/q35/slic", test_acpi_q35_slic);
 } else if (strcmp(arch, "aarch64") == 0) {
 if (has_tcg) {
-- 
2.32.0

[PATCH for-7.1 13/18] hw/arm/exynos4210: Fill in irq_table[] for internal-combiner-only IRQ lines

2022-04-04 Thread Peter Maydell

In exynos4210_init_board_irqs(), the loop that handles IRQ lines that
are in a range that applies to the internal combiner only creates a
splitter for those interrupts which go to both the internal combiner
and to the external GIC, but it does nothing at all for the
interrupts which don't go to the external GIC, leaving the
irq_table[] array element empty for those.  (This will result in
those interrupts simply being lost, not in a QEMU crash.)

I don't have a reliable datasheet for this SoC, but since we do wire
up one interrupt line in this category (the HDMI I2C device on
interrupt 16,1), this seems like it must be a bug in the existing
QEMU code.  Fill in the irq_table[] entries where we're not splitting
the IRQ to both the internal combiner and the external GIC with the
IRQ line of the internal combiner.  (That is, these IRQ lines go to
just one device, not multiple.)

This bug didn't have any visible guest effects because the only
implemented device that was affected was the HDMI I2C controller,
and we never connect any I2C devices to that bus.

Signed-off-by: Peter Maydell 
---
 hw/arm/exynos4210.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 919821833b5..a4527f819ef 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -310,6 +310,8 @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
 qdev_connect_gpio_out(splitter, 0, is->int_combiner_irq[n]);
 qdev_connect_gpio_out(splitter, 1,
   qdev_get_gpio_in(extgicdev, irq_id - 32));
+} else {
+s->irq_table[n] = is->int_combiner_irq[n];
 }
 }
 /*
-- 
2.25.1

[PATCH for-7.1 05/18] hw/arm/exynos4210: Coalesce board_irqs and irq_table

2022-04-04 Thread Peter Maydell

The exynos4210 code currently has two very similar arrays of IRQs:

 * board_irqs is a field of the Exynos4210Irq struct which is filled
   in by exynos4210_init_board_irqs() with the appropriate qemu_irqs
   for each IRQ the board/SoC can assert
 * irq_table is a set of qemu_irqs pointed to from the
   Exynos4210State struct.  It's allocated in exynos4210_init_irq,
   and the only behaviour these irqs have is that they pass on the
   level to the equivalent board_irqs[] irq

The extra indirection through irq_table is unnecessary, so coalesce
these into a single irq_table[] array as a direct field in
Exynos4210State which exynos4210_init_board_irqs() fills in.

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h |  8 ++--
 hw/arm/exynos4210.c |  6 +-
 hw/intc/exynos4210_gic.c| 32 
 3 files changed, 11 insertions(+), 35 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index 923ce987627..a9f186370ee 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -83,7 +83,6 @@ typedef struct Exynos4210Irq {
 qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
 qemu_irq ext_combiner_irq[EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ];
 qemu_irq ext_gic_irq[EXYNOS4210_EXT_GIC_NIRQ];
-qemu_irq board_irqs[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
 } Exynos4210Irq;
 
 struct Exynos4210State {
@@ -92,7 +91,7 @@ struct Exynos4210State {
 /*< public >*/
 ARMCPU *cpu[EXYNOS4210_NCPUS];
 Exynos4210Irq irqs;
-qemu_irq *irq_table;
+qemu_irq irq_table[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
 
 MemoryRegion chipid_mem;
 MemoryRegion iram_mem;
@@ -112,12 +111,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(Exynos4210State, EXYNOS4210_SOC)
 void exynos4210_write_secondary(ARMCPU *cpu,
 const struct arm_boot_info *info);
 
-/* Initialize exynos4210 IRQ subsystem stub */
-qemu_irq *exynos4210_init_irq(Exynos4210Irq *env);
-
 /* Initialize board IRQs.
  * These IRQs contain splitted Int/External Combiner and External Gic IRQs */
-void exynos4210_init_board_irqs(Exynos4210Irq *s);
+void exynos4210_init_board_irqs(Exynos4210State *s);
 
 /* Get IRQ number from exynos4210 IRQ subsystem stub.
  * To identify IRQ source use internal combiner group and bit number
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 60fc5a2ffe7..11e321d7830 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -228,10 +228,6 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 qdev_realize(DEVICE(cpuobj), NULL, _fatal);
 }
 
-/*** IRQs ***/
-
-s->irq_table = exynos4210_init_irq(>irqs);
-
 /* IRQ Gate */
 for (i = 0; i < EXYNOS4210_NCPUS; i++) {
 DeviceState *orgate = DEVICE(>cpu_irq_orgate[i]);
@@ -296,7 +292,7 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 sysbus_mmio_map(busdev, 0, EXYNOS4210_EXT_COMBINER_BASE_ADDR);
 
 /* Initialize board IRQs. */
-exynos4210_init_board_irqs(>irqs);
+exynos4210_init_board_irqs(s);
 
 /*** Memory ***/
 
diff --git a/hw/intc/exynos4210_gic.c b/hw/intc/exynos4210_gic.c
index 794f6b5ac72..ec79b96f6d1 100644
--- a/hw/intc/exynos4210_gic.c
+++ b/hw/intc/exynos4210_gic.c
@@ -192,30 +192,14 @@ 
combiner_grp_to_gic_id[64-EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
 #define EXYNOS4210_GIC_CPU_REGION_SIZE  0x100
 #define EXYNOS4210_GIC_DIST_REGION_SIZE 0x1000
 
-static void exynos4210_irq_handler(void *opaque, int irq, int level)
-{
-Exynos4210Irq *s = (Exynos4210Irq *)opaque;
-
-/* Bypass */
-qemu_set_irq(s->board_irqs[irq], level);
-}
-
-/*
- * Initialize exynos4210 IRQ subsystem stub.
- */
-qemu_irq *exynos4210_init_irq(Exynos4210Irq *s)
-{
-return qemu_allocate_irqs(exynos4210_irq_handler, s,
-EXYNOS4210_MAX_INT_COMBINER_IN_IRQ);
-}
-
 /*
  * Initialize board IRQs.
  * These IRQs contain splitted Int/External Combiner and External Gic IRQs.
  */
-void exynos4210_init_board_irqs(Exynos4210Irq *s)
+void exynos4210_init_board_irqs(Exynos4210State *s)
 {
 uint32_t grp, bit, irq_id, n;
+Exynos4210Irq *is = >irqs;
 
 for (n = 0; n < EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ; n++) {
 irq_id = 0;
@@ -230,11 +214,11 @@ void exynos4210_init_board_irqs(Exynos4210Irq *s)
 irq_id = EXT_GIC_ID_MCT_G1;
 }
 if (irq_id) {
-s->board_irqs[n] = qemu_irq_split(s->int_combiner_irq[n],
-s->ext_gic_irq[irq_id-32]);
+s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
+is->ext_gic_irq[irq_id - 32]);
 } else {
-s->board_irqs[n] = qemu_irq_split(s->int_combiner_irq[n],
-s->ext_combiner_irq[n]);
+s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
+is->ext_combiner_irq[n]);
 }
 }
 for (; n < EXYNOS4210_MAX_INT_COMBINER_IN_IRQ; n++) {
@@ -245,8 +229,8 @@ void

[PATCH v9 36/45] tests/acpi: q35: Allow addition of a CXL test.

2022-04-04 Thread Jonathan Cameron via

Add exceptions for the DSDT and the new CEDT tables
specific to a new CXL test in the following patch.

Signed-off-by: Jonathan Cameron 
---
 tests/data/acpi/q35/CEDT.cxl| 0
 tests/data/acpi/q35/DSDT.cxl| 0
 tests/qtest/bios-tables-test-allowed-diff.h | 2 ++
 3 files changed, 2 insertions(+)

diff --git a/tests/data/acpi/q35/CEDT.cxl b/tests/data/acpi/q35/CEDT.cxl
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/data/acpi/q35/DSDT.cxl b/tests/data/acpi/q35/DSDT.cxl
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..7c7f9fbc44 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,3 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/q35/DSDT.cxl",
+"tests/data/acpi/q35/CEDT.cxl",
-- 
2.32.0

[PATCH for-7.1 16/18] hw/arm/exynos4210: Fold combiner splits into exynos4210_init_board_irqs()

2022-04-04 Thread Peter Maydell

At this point, the function exynos4210_init_board_irqs() splits input
IRQ lines to connect them to the input combiner, output combiner and
external GIC.  The function exynos4210_combiner_get_gpioin() splits
some of the combiner input lines further to connect them to multiple
different inputs on the combiner.

Because (unlike qemu_irq_split()) the TYPE_SPLIT_IRQ device has a
configurable number of outputs, we can do all this in one place, by
making exynos4210_init_board_irqs() add extra outputs to the splitter
device when it must be connected to more than one input on each
combiner.

We do this with a new data structure, the combinermap, which is an
array each of whose elements is a list of the interrupt IDs on the
combiner which must be tied together.  As we loop through each
interrupt ID, if we find that it is the first one in one of these
lists, we configure the splitter device with eonugh extra outputs and
wire them up to the other interrupt IDs in the list.

Conveniently, for all the cases where this is necessary, the
lowest-numbered interrupt ID in each group is in the range of the
external combiner, so we only need to code for this in the first of
the two loops in exynos4210_init_board_irqs().

The old code in exynos4210_combiner_get_gpioin() which is being
deleted here had several problems which don't exist in the new code
in its handling of the multi-core timer interrupts:
 (1) the case labels specified bits 4 ... 8, but bit '8' doesn't
 exist; these should have been 4 ... 7
 (2) it used the input irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]
 multiple times as the input of several different splitters,
 which isn't allowed
 (3) in an apparent cut-and-paste error, the cases for all the
 multi-core timer inputs used "bit + 4" even though the
 bit range for the case was (intended to be) 4 ... 7, which
 meant it was looking at non-existent bits 8 ... 11.
None of these exist in the new code.

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h |   6 +-
 hw/arm/exynos4210.c | 178 +++-
 2 files changed, 119 insertions(+), 65 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index 7da3eddea5f..f24617f681d 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -74,10 +74,12 @@
 
 /*
  * We need one splitter for every external combiner input, plus
- * one for every non-zero entry in combiner_grp_to_gic_id[].
+ * one for every non-zero entry in combiner_grp_to_gic_id[],
+ * minus one for every external combiner ID in second or later
+ * places in a combinermap[] line.
  * We'll assert in exynos4210_init_board_irqs() if this is wrong.
  */
-#define EXYNOS4210_NUM_SPLITTERS (EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ + 54)
+#define EXYNOS4210_NUM_SPLITTERS (EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ + 38)
 
 typedef struct Exynos4210Irq {
 qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 39e334e0773..05b28cf5905 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -254,6 +254,76 @@ combiner_grp_to_gic_id[64 - 
EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
 #define EXYNOS4210_COMBINER_GET_BIT_NUM(irq) \
 ((irq) - 8 * EXYNOS4210_COMBINER_GET_GRP_NUM(irq))
 
+/*
+ * Some interrupt lines go to multiple combiner inputs.
+ * This data structure defines those: each array element is
+ * a list of combiner inputs which are connected together;
+ * the one with the smallest interrupt ID value must be first.
+ * As with combiner_grp_to_gic_id[], we rely on (0, 0) not being
+ * wired to anything so we can use 0 as a terminator.
+ */
+#define IRQNO(G, B) EXYNOS4210_COMBINER_GET_IRQ_NUM(G, B)
+#define IRQNONE 0
+
+#define COMBINERMAP_SIZE 16
+
+static const int combinermap[COMBINERMAP_SIZE][6] = {
+/* MDNIE_LCD1 */
+{ IRQNO(0, 4), IRQNO(1, 0), IRQNONE },
+{ IRQNO(0, 5), IRQNO(1, 1), IRQNONE },
+{ IRQNO(0, 6), IRQNO(1, 2), IRQNONE },
+{ IRQNO(0, 7), IRQNO(1, 3), IRQNONE },
+/* TMU */
+{ IRQNO(2, 4), IRQNO(3, 4), IRQNONE },
+{ IRQNO(2, 5), IRQNO(3, 5), IRQNONE },
+{ IRQNO(2, 6), IRQNO(3, 6), IRQNONE },
+{ IRQNO(2, 7), IRQNO(3, 7), IRQNONE },
+/* LCD1 */
+{ IRQNO(11, 4), IRQNO(12, 0), IRQNONE },
+{ IRQNO(11, 5), IRQNO(12, 1), IRQNONE },
+{ IRQNO(11, 6), IRQNO(12, 2), IRQNONE },
+{ IRQNO(11, 7), IRQNO(12, 3), IRQNONE },
+/* Multi-core timer */
+{ IRQNO(1, 4), IRQNO(12, 4), IRQNO(35, 4), IRQNO(51, 4), IRQNO(53, 4), 
IRQNONE },
+{ IRQNO(1, 5), IRQNO(12, 5), IRQNO(35, 5), IRQNO(51, 5), IRQNO(53, 5), 
IRQNONE },
+{ IRQNO(1, 6), IRQNO(12, 6), IRQNO(35, 6), IRQNO(51, 6), IRQNO(53, 6), 
IRQNONE },
+{ IRQNO(1, 7), IRQNO(12, 7), IRQNO(35, 7), IRQNO(51, 7), IRQNO(53, 7), 
IRQNONE },
+};
+
+#undef IRQNO
+
+static const int *combinermap_entry(int irq)
+{
+/*
+ * If the interrupt number passed in is the first entry in some
+ * line of the

[PATCH for-7.1 04/18] hw/arm/exynos4210: Drop int_gic_irq[] from Exynos4210Irq struct

2022-04-04 Thread Peter Maydell

The only time we use the int_gic_irq[] array in the Exynos4210Irq
struct is in the exynos4210_realize() function: we initialize it with
the GPIO inputs of the a9mpcore device, and then a bit later on we
connect those to the outputs of the internal combiner.  Now that the
a9mpcore object is easily accessible as s->a9mpcore we can make the
connection directly from one device to the other without going via
this array.

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h | 1 -
 hw/arm/exynos4210.c | 6 ++
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index 215c039b414..923ce987627 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -82,7 +82,6 @@
 typedef struct Exynos4210Irq {
 qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
 qemu_irq ext_combiner_irq[EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ];
-qemu_irq int_gic_irq[EXYNOS4210_INT_GIC_NIRQ];
 qemu_irq ext_gic_irq[EXYNOS4210_EXT_GIC_NIRQ];
 qemu_irq board_irqs[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
 } Exynos4210Irq;
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index ef4d646eb91..60fc5a2ffe7 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -252,9 +252,6 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 sysbus_connect_irq(busdev, n,
qdev_get_gpio_in(DEVICE(>cpu_irq_orgate[n]), 0));
 }
-for (n = 0; n < EXYNOS4210_INT_GIC_NIRQ; n++) {
-s->irqs.int_gic_irq[n] = qdev_get_gpio_in(DEVICE(>a9mpcore), n);
-}
 
 /* Cache controller */
 sysbus_create_simple("l2x0", EXYNOS4210_L2X0_BASE_ADDR, NULL);
@@ -281,7 +278,8 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 busdev = SYS_BUS_DEVICE(dev);
 sysbus_realize_and_unref(busdev, _fatal);
 for (n = 0; n < EXYNOS4210_MAX_INT_COMBINER_OUT_IRQ; n++) {
-sysbus_connect_irq(busdev, n, s->irqs.int_gic_irq[n]);
+sysbus_connect_irq(busdev, n,
+   qdev_get_gpio_in(DEVICE(>a9mpcore), n));
 }
 exynos4210_combiner_get_gpioin(>irqs, dev, 0);
 sysbus_mmio_map(busdev, 0, EXYNOS4210_INT_COMBINER_BASE_ADDR);
-- 
2.25.1

[PATCH for-7.1 11/18] hw/arm/exynos4210: Delete unused macro definitions

2022-04-04 Thread Peter Maydell

Delete a couple of #defines which are never used.

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h | 4 
 1 file changed, 4 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index b564e3582bb..f0769a4045b 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -67,10 +67,6 @@
 #define EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ   \
 (EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ * 8)
 
-/* IRQs number for external and internal GIC */
-#define EXYNOS4210_EXT_GIC_NIRQ (160-32)
-#define EXYNOS4210_INT_GIC_NIRQ 64
-
 #define EXYNOS4210_I2C_NUMBER   9
 
 #define EXYNOS4210_NUM_DMA  3
-- 
2.25.1

[PATCH v9 33/45] cxl/cxl-host: Add memops for CFMWS region.

2022-04-04 Thread Jonathan Cameron via

From: Jonathan Cameron 

These memops perform interleave decoding, walking down the
CXL topology from CFMWS described host interleave
decoder via CXL host bridge HDM decoders, through the CXL
root ports and finally call CXL type 3 specific read and write
functions.

Note that, whilst functional the current implementation does
not support:
* switches
* multiple HDM decoders at a given level.
* unaligned accesses across the interleave boundaries

Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-host-stubs.c |   2 +
 hw/cxl/cxl-host.c   | 128 
 include/hw/cxl/cxl.h|   2 +
 3 files changed, 132 insertions(+)

diff --git a/hw/cxl/cxl-host-stubs.c b/hw/cxl/cxl-host-stubs.c
index f8fd278d5d..24465a52ab 100644
--- a/hw/cxl/cxl-host-stubs.c
+++ b/hw/cxl/cxl-host-stubs.c
@@ -12,3 +12,5 @@ void cxl_fixed_memory_window_config(MachineState *ms,
 Error **errp) {};
 
 void cxl_fixed_memory_window_link_targets(Error **errp) {};
+
+const MemoryRegionOps cfmws_ops;
diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
index ec5a75cbf5..469b3c4ced 100644
--- a/hw/cxl/cxl-host.c
+++ b/hw/cxl/cxl-host.c
@@ -15,6 +15,10 @@
 
 #include "qapi/qapi-visit-machine.h"
 #include "hw/cxl/cxl.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pci_host.h"
+#include "hw/pci/pcie_port.h"
 
 void cxl_fixed_memory_window_config(MachineState *ms,
 CXLFixedMemoryWindowOptions *object,
@@ -92,3 +96,127 @@ void cxl_fixed_memory_window_link_targets(Error **errp)
 }
 }
 }
+
+/* TODO: support, multiple hdm decoders */
+static bool cxl_hdm_find_target(uint32_t *cache_mem, hwaddr addr,
+uint8_t *target)
+{
+uint32_t ctrl;
+uint32_t ig_enc;
+uint32_t iw_enc;
+uint32_t target_reg;
+uint32_t target_idx;
+
+ctrl = cache_mem[R_CXL_HDM_DECODER0_CTRL];
+if (!FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED)) {
+return false;
+}
+
+ig_enc = FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IG);
+iw_enc = FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IW);
+target_idx = (addr / cxl_decode_ig(ig_enc)) % (1 << iw_enc);
+
+if (target_idx > 4) {
+target_reg = cache_mem[R_CXL_HDM_DECODER0_TARGET_LIST_LO];
+target_reg >>= target_idx * 8;
+} else {
+target_reg = cache_mem[R_CXL_HDM_DECODER0_TARGET_LIST_LO];
+target_reg >>= (target_idx - 4) * 8;
+}
+*target = target_reg & 0xff;
+
+return true;
+}
+
+static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr)
+{
+CXLComponentState *hb_cstate;
+PCIHostState *hb;
+int rb_index;
+uint32_t *cache_mem;
+uint8_t target;
+bool target_found;
+PCIDevice *rp, *d;
+
+/* Address is relative to memory region. Convert to HPA */
+addr += fw->base;
+
+rb_index = (addr / cxl_decode_ig(fw->enc_int_gran)) % fw->num_targets;
+hb = PCI_HOST_BRIDGE(fw->target_hbs[rb_index]->cxl.cxl_host_bridge);
+if (!hb || !hb->bus || !pci_bus_is_cxl(hb->bus)) {
+return NULL;
+}
+
+hb_cstate = cxl_get_hb_cstate(hb);
+if (!hb_cstate) {
+return NULL;
+}
+
+cache_mem = hb_cstate->crb.cache_mem_registers;
+
+target_found = cxl_hdm_find_target(cache_mem, addr, );
+if (!target_found) {
+return NULL;
+}
+
+rp = pcie_find_port_by_pn(hb->bus, target);
+if (!rp) {
+return NULL;
+}
+
+d = pci_bridge_get_sec_bus(PCI_BRIDGE(rp))->devices[0];
+
+if (!d || !object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3)) {
+return NULL;
+}
+
+return d;
+}
+
+static MemTxResult cxl_read_cfmws(void *opaque, hwaddr addr, uint64_t *data,
+  unsigned size, MemTxAttrs attrs)
+{
+CXLFixedWindow *fw = opaque;
+PCIDevice *d;
+
+d = cxl_cfmws_find_device(fw, addr);
+if (d == NULL) {
+*data = 0;
+/* Reads to invalid address return poison */
+return MEMTX_ERROR;
+}
+
+return cxl_type3_read(d, addr + fw->base, data, size, attrs);
+}
+
+static MemTxResult cxl_write_cfmws(void *opaque, hwaddr addr,
+   uint64_t data, unsigned size,
+   MemTxAttrs attrs)
+{
+CXLFixedWindow *fw = opaque;
+PCIDevice *d;
+
+d = cxl_cfmws_find_device(fw, addr);
+if (d == NULL) {
+/* Writes to invalid address are silent */
+return MEMTX_OK;
+}
+
+return cxl_type3_write(d, addr + fw->base, data, size, attrs);
+}
+
+const MemoryRegionOps cfmws_ops = {
+.read_with_attrs = cxl_read_cfmws,
+.write_with_attrs = cxl_write_cfmws,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = true,
+},
+.impl = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = true,
+},
+};
diff --git

[PATCH for-7.1 12/18] hw/arm/exynos4210: Use TYPE_SPLIT_IRQ in exynos4210_init_board_irqs()

2022-04-04 Thread Peter Maydell

In exynos4210_init_board_irqs(), use the TYPE_SPLIT_IRQ device
instead of qemu_irq_split().

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h |  9 
 hw/arm/exynos4210.c | 41 +
 2 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index f0769a4045b..f58ee0f2686 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -28,6 +28,7 @@
 #include "hw/sysbus.h"
 #include "hw/cpu/a9mpcore.h"
 #include "hw/intc/exynos4210_gic.h"
+#include "hw/core/split-irq.h"
 #include "target/arm/cpu-qom.h"
 #include "qom/object.h"
 
@@ -71,6 +72,13 @@
 
 #define EXYNOS4210_NUM_DMA  3
 
+/*
+ * We need one splitter for every external combiner input, plus
+ * one for every non-zero entry in combiner_grp_to_gic_id[].
+ * We'll assert in exynos4210_init_board_irqs() if this is wrong.
+ */
+#define EXYNOS4210_NUM_SPLITTERS (EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ + 60)
+
 typedef struct Exynos4210Irq {
 qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
 qemu_irq ext_combiner_irq[EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ];
@@ -95,6 +103,7 @@ struct Exynos4210State {
 qemu_or_irq cpu_irq_orgate[EXYNOS4210_NCPUS];
 A9MPPrivState a9mpcore;
 Exynos4210GicState ext_gic;
+SplitIRQ splitter[EXYNOS4210_NUM_SPLITTERS];
 };
 
 #define TYPE_EXYNOS4210_SOC "exynos4210"
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 86a9a0dae12..919821833b5 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -263,6 +263,8 @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
 uint32_t grp, bit, irq_id, n;
 Exynos4210Irq *is = >irqs;
 DeviceState *extgicdev = DEVICE(>ext_gic);
+int splitcount = 0;
+DeviceState *splitter;
 
 for (n = 0; n < EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ; n++) {
 irq_id = 0;
@@ -276,13 +278,19 @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
 /* MCT_G1 is passed to External and GIC */
 irq_id = EXT_GIC_ID_MCT_G1;
 }
+
+assert(splitcount < EXYNOS4210_NUM_SPLITTERS);
+splitter = DEVICE(>splitter[splitcount]);
+qdev_prop_set_uint16(splitter, "num-lines", 2);
+qdev_realize(splitter, NULL, _abort);
+splitcount++;
+s->irq_table[n] = qdev_get_gpio_in(splitter, 0);
+qdev_connect_gpio_out(splitter, 0, is->int_combiner_irq[n]);
 if (irq_id) {
-s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
- qdev_get_gpio_in(extgicdev,
-  irq_id - 32));
+qdev_connect_gpio_out(splitter, 1,
+  qdev_get_gpio_in(extgicdev, irq_id - 32));
 } else {
-s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
-is->ext_combiner_irq[n]);
+qdev_connect_gpio_out(splitter, 1, is->ext_combiner_irq[n]);
 }
 }
 for (; n < EXYNOS4210_MAX_INT_COMBINER_IN_IRQ; n++) {
@@ -293,11 +301,23 @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
  EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][bit];
 
 if (irq_id) {
-s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
- qdev_get_gpio_in(extgicdev,
-  irq_id - 32));
+assert(splitcount < EXYNOS4210_NUM_SPLITTERS);
+splitter = DEVICE(>splitter[splitcount]);
+qdev_prop_set_uint16(splitter, "num-lines", 2);
+qdev_realize(splitter, NULL, _abort);
+splitcount++;
+s->irq_table[n] = qdev_get_gpio_in(splitter, 0);
+qdev_connect_gpio_out(splitter, 0, is->int_combiner_irq[n]);
+qdev_connect_gpio_out(splitter, 1,
+  qdev_get_gpio_in(extgicdev, irq_id - 32));
 }
 }
+/*
+ * We check this here to avoid a more obscure assert later when
+ * qdev_assert_realized_properly() checks that we realized every
+ * child object we initialized.
+ */
+assert(splitcount == EXYNOS4210_NUM_SPLITTERS);
 }
 
 /*
@@ -766,6 +786,11 @@ static void exynos4210_init(Object *obj)
 object_initialize_child(obj, name, >cpu_irq_orgate[i], TYPE_OR_IRQ);
 }
 
+for (i = 0; i < ARRAY_SIZE(s->splitter); i++) {
+g_autofree char *name = g_strdup_printf("irq-splitter%d", i);
+object_initialize_child(obj, name, >splitter[i], TYPE_SPLIT_IRQ);
+}
+
 object_initialize_child(obj, "a9mpcore", >a9mpcore, TYPE_A9MPCORE_PRIV);
 object_initialize_child(obj, "ext-gic", >ext_gic, TYPE_EXYNOS4210_GIC);
 }
-- 
2.25.1

[PATCH for-7.1 03/18] hw/arm/exynos4210: Put a9mpcore device into state struct

2022-04-04 Thread Peter Maydell

The exynos4210 SoC mostly creates its child devices as if it were
board code.  This includes the a9mpcore object.  Switch that to a
new-style "embedded in the state struct" creation, because in the
next commit we're going to want to refer to the object again further
down in the exynos4210_realize() function.

Signed-off-by: Peter Maydell 
---
I don't propose to try to do a full conversion of every child
device; I'm only going to do them where it makes a subsequent
commit a bit nicer, like this one.
---
 include/hw/arm/exynos4210.h |  2 ++
 hw/arm/exynos4210.c | 11 ++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index 3999034053e..215c039b414 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -26,6 +26,7 @@
 
 #include "hw/or-irq.h"
 #include "hw/sysbus.h"
+#include "hw/cpu/a9mpcore.h"
 #include "target/arm/cpu-qom.h"
 #include "qom/object.h"
 
@@ -103,6 +104,7 @@ struct Exynos4210State {
 I2CBus *i2c_if[EXYNOS4210_I2C_NUMBER];
 qemu_or_irq pl330_irq_orgate[EXYNOS4210_NUM_DMA];
 qemu_or_irq cpu_irq_orgate[EXYNOS4210_NCPUS];
+A9MPPrivState a9mpcore;
 };
 
 #define TYPE_EXYNOS4210_SOC "exynos4210"
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index dfc0a4eec25..ef4d646eb91 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -244,17 +244,16 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 }
 
 /* Private memory region and Internal GIC */
-dev = qdev_new(TYPE_A9MPCORE_PRIV);
-qdev_prop_set_uint32(dev, "num-cpu", EXYNOS4210_NCPUS);
-busdev = SYS_BUS_DEVICE(dev);
-sysbus_realize_and_unref(busdev, _fatal);
+qdev_prop_set_uint32(DEVICE(>a9mpcore), "num-cpu", EXYNOS4210_NCPUS);
+busdev = SYS_BUS_DEVICE(>a9mpcore);
+sysbus_realize(busdev, _fatal);
 sysbus_mmio_map(busdev, 0, EXYNOS4210_SMP_PRIVATE_BASE_ADDR);
 for (n = 0; n < EXYNOS4210_NCPUS; n++) {
 sysbus_connect_irq(busdev, n,
qdev_get_gpio_in(DEVICE(>cpu_irq_orgate[n]), 0));
 }
 for (n = 0; n < EXYNOS4210_INT_GIC_NIRQ; n++) {
-s->irqs.int_gic_irq[n] = qdev_get_gpio_in(dev, n);
+s->irqs.int_gic_irq[n] = qdev_get_gpio_in(DEVICE(>a9mpcore), n);
 }
 
 /* Cache controller */
@@ -489,6 +488,8 @@ static void exynos4210_init(Object *obj)
 g_autofree char *name = g_strdup_printf("cpu-irq-orgate%d", i);
 object_initialize_child(obj, name, >cpu_irq_orgate[i], TYPE_OR_IRQ);
 }
+
+object_initialize_child(obj, "a9mpcore", >a9mpcore, TYPE_A9MPCORE_PRIV);
 }
 
 static void exynos4210_class_init(ObjectClass *klass, void *data)
-- 
2.25.1

[PATCH v9 29/45] hw/pci-host/gpex-acpi: Add support for dsdt construction for pxb-cxl

2022-04-04 Thread Jonathan Cameron via

This adds code to instantiate the slightly extended ACPI root port
description in DSDT as per the CXL 2.0 specification.

Basically a cut and paste job from the i386/pc code.

Signed-off-by: Jonathan Cameron 
Signed-off-by: Ben Widawsky 
Reviewed-by: Alex Bennée 
---
 hw/arm/Kconfig  |  1 +
 hw/pci-host/gpex-acpi.c | 20 +---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 97f3b38019..219262a8da 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -29,6 +29,7 @@ config ARM_VIRT
 select ACPI_APEI
 select ACPI_VIOT
 select VIRTIO_MEM_SUPPORTED
+select ACPI_CXL
 
 config CHEETAH
 bool
diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
index e7e162a00a..7c7316bc96 100644
--- a/hw/pci-host/gpex-acpi.c
+++ b/hw/pci-host/gpex-acpi.c
@@ -5,6 +5,7 @@
 #include "hw/pci/pci_bus.h"
 #include "hw/pci/pci_bridge.h"
 #include "hw/pci/pcie_host.h"
+#include "hw/acpi/cxl.h"
 
 static void acpi_dsdt_add_pci_route_table(Aml *dev, uint32_t irq)
 {
@@ -139,6 +140,7 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
 QLIST_FOREACH(bus, >child, sibling) {
 uint8_t bus_num = pci_bus_num(bus);
 uint8_t numa_node = pci_bus_numa_node(bus);
+bool is_cxl = pci_bus_is_cxl(bus);
 
 if (!pci_bus_is_root(bus)) {
 continue;
@@ -154,8 +156,16 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
 }
 
 dev = aml_device("PC%.02X", bus_num);
-aml_append(dev, aml_name_decl("_HID", aml_string("PNP0A08")));
-aml_append(dev, aml_name_decl("_CID", aml_string("PNP0A03")));
+if (is_cxl) {
+struct Aml *pkg = aml_package(2);
+aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0016")));
+aml_append(pkg, aml_eisaid("PNP0A08"));
+aml_append(pkg, aml_eisaid("PNP0A03"));
+aml_append(dev, aml_name_decl("_CID", pkg));
+} else {
+aml_append(dev, aml_name_decl("_HID", aml_string("PNP0A08")));
+aml_append(dev, aml_name_decl("_CID", aml_string("PNP0A03")));
+}
 aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
 aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
 aml_append(dev, aml_name_decl("_STR", aml_unicode("pxb Device")));
@@ -175,7 +185,11 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
 cfg->pio.base, 0, 0, 0);
 aml_append(dev, aml_name_decl("_CRS", crs));
 
-acpi_dsdt_add_pci_osc(dev);
+if (is_cxl) {
+build_cxl_osc_method(dev);
+} else {
+acpi_dsdt_add_pci_osc(dev);
+}
 
 aml_append(scope, dev);
 }
-- 
2.32.0

[PATCH for-7.1 00/18] hw/arm: Make exynos4210 use TYPE_SPLIT_IRQ

2022-04-04 Thread Peter Maydell

The primary aim of this patchset is to make the exynos4210 code use
the TYPE_SPLIT_IRQ device instead of the old qemu_split_irq() function
(which we are trying to get rid of). However, the current code is
quite complicated and so we have to do a fair amount of refactoring
in order to be able to use TYPE_SPLIT_IRQ in a clean way.

The interrupt wiring on this SoC is complicated and interrupt
lines from devices may be wired up to multiple places:
 * a GIC device
 * an internal combiner
 * an external combiner
(a combiner is a fairly simple "OR multiple IRQ sources together
in groups with enable/disable and mask logic" piece of hardware).
In some cases an interrupt is wired up to more than one input
on each combiner.

The current code has a struct Exynos4210Irq where it keeps arrays of
qemu_irqs corresponding to the inputs of these devices, and it handles
the "wire interrupt lines to multiple places" logic in functions which
are called by the SoC device model but which live in the source files
with the combiner and GIC models. This series moves the logic to the
SoC device model's source file (because it is really part of the SoC
wiring, not part of the individual combiner or GIC devices) and makes
use of the TYPE_SPLIT_IRQ ability to provide more than 2 output lines
to simplify things so that each interrupt line connects to just one
splitter, whose outputs go to all the places they need to. In the
new setup, these splitter devices clearly belong to the SoC object,
and so they are created as QOM children of it. The Exynos4210Irq
struct ends up unneeded and is deleted.

I have also done some conversion of specific child devices of this SoC
from the old-style "call qemu_new()" to the new-style "embed the child
device struct in the parent state struct". I haven't done a complete
conversion, but only touched those devices where making the conversion
was useful for the TYPE_SPLIT_IRQ changes.

I don't have a datasheet for this SoC that describes all the external
combiner and external GIC wiring, so I have mostly kept the QEMU
behaviour the same as it is currently. In a few places, however, I
have fixed what seem to me to be fairly clearly bugs in the current
handling. (Largely these bugs weren't visible to the guest because
we weren't actually connecting up devices to the affected bits of
the interrupt line wiring.)

I've tested this with a simple Linux image, which I think is basically
the same one as the 'make check-acceptance' test. If anybody has
access to other test images that would be interesting.

thanks
-- PMM

Peter Maydell (18):
  hw/arm/exynos4210: Use TYPE_OR_IRQ instead of custom OR-gate device
  hw/intc/exynos4210_gic: Remove unused TYPE_EXYNOS4210_IRQ_GATE
  hw/arm/exynos4210: Put a9mpcore device into state struct
  hw/arm/exynos4210: Drop int_gic_irq[] from Exynos4210Irq struct
  hw/arm/exynos4210: Coalesce board_irqs and irq_table
  hw/arm/exynos4210: Fix code style nit in combiner_grp_to_gic_id[]
  hw/arm/exynos4210: Move exynos4210_init_board_irqs() into exynos4210.c
  hw/arm/exynos4210: Put external GIC into state struct
  hw/arm/exynos4210: Drop ext_gic_irq[] from Exynos4210Irq struct
  hw/arm/exynos4210: Move exynos4210_combiner_get_gpioin() into
exynos4210.c
  hw/arm/exynos4210: Delete unused macro definitions
  hw/arm/exynos4210: Use TYPE_SPLIT_IRQ in exynos4210_init_board_irqs()
  hw/arm/exynos4210: Fill in irq_table[] for internal-combiner-only IRQ
lines
  hw/arm/exynos4210: Connect MCT_G0 and MCT_G1 to both combiners
  hw/arm/exynos4210: Don't connect multiple lines to external GIC inputs
  hw/arm/exynos4210: Fold combiner splits into
exynos4210_init_board_irqs()
  hw/arm/exynos4210: Put combiners into state struct
  hw/arm/exynos4210: Drop Exynos4210Irq struct

 include/hw/arm/exynos4210.h   |  50 ++-
 include/hw/intc/exynos4210_combiner.h |  57 
 include/hw/intc/exynos4210_gic.h  |  43 +++
 hw/arm/exynos4210.c   | 430 +++---
 hw/intc/exynos4210_combiner.c | 108 +--
 hw/intc/exynos4210_gic.c  | 344 +
 MAINTAINERS   |   2 +-
 7 files changed, 508 insertions(+), 526 deletions(-)
 create mode 100644 include/hw/intc/exynos4210_combiner.h
 create mode 100644 include/hw/intc/exynos4210_gic.h

-- 
2.25.1

[PATCH v9 27/45] hw/cxl/host: Add support for CXL Fixed Memory Windows.

2022-04-04 Thread Jonathan Cameron via

From: Jonathan Cameron 

The concept of these is introduced in [1] in terms of the
description the CEDT ACPI table. The principal is more general.
Unlike once traffic hits the CXL root bridges, the host system
memory address routing is implementation defined and effectively
static once observable by standard / generic system software.
Each CXL Fixed Memory Windows (CFMW) is a region of PA space
which has fixed system dependent routing configured so that
accesses can be routed to the CXL devices below a set of target
root bridges. The accesses may be interleaved across multiple
root bridges.

For QEMU we could have fully specified these regions in terms
of a base PA + size, but as the absolute address does not matter
it is simpler to let individual platforms place the memory regions.

ExampleS:
-cxl-fixed-memory-window targets.0=cxl.0,size=128G
-cxl-fixed-memory-window targets.0=cxl.1,size=128G
-cxl-fixed-memory-window 
targets.0=cxl0,targets.1=cxl.1,size=256G,interleave-granularity=2k

Specifies
* 2x 128G regions not interleaved across root bridges, one for each of
  the root bridges with ids cxl.0 and cxl.1
* 256G region interleaved across root bridges with ids cxl.0 and cxl.1
with a 2k interleave granularity.

When system software enumerates the devices below a given root bridge
it can then decide which CFMW to use. If non interleave is desired
(or possible) it can use the appropriate CFMW for the root bridge in
question.  If there are suitable devices to interleave across the
two root bridges then it may use the 3rd CFMS.

A number of other designs were considered but the following constraints
made it hard to adapt existing QEMU approaches to this particular problem.
1) The size must be known before a specific architecture / board brings
   up it's PA memory map.  We need to set up an appropriate region.
2) Using links to the host bridges provides a clean command line interface
   but these links cannot be established until command line devices have
   been added.

Hence the two step process used here of first establishing the size,
interleave-ways and granularity + caching the ids of the host bridges
and then, once available finding the actual host bridges so they can
be used later to support interleave decoding.

[1] CXL 2.0 ECN: CEDT CFMWS & QTG DSM (computeexpresslink.org / specifications)

Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-host-stubs.c | 14 ++
 hw/cxl/cxl-host.c   | 94 +
 hw/cxl/meson.build  |  6 +++
 include/hw/cxl/cxl.h| 21 +
 qapi/machine.json   | 21 +
 qemu-options.hx | 38 +
 softmmu/vl.c| 47 +
 7 files changed, 241 insertions(+)

diff --git a/hw/cxl/cxl-host-stubs.c b/hw/cxl/cxl-host-stubs.c
new file mode 100644
index 00..f8fd278d5d
--- /dev/null
+++ b/hw/cxl/cxl-host-stubs.c
@@ -0,0 +1,14 @@
+/*
+ * CXL host parameter parsing routine stubs
+ *
+ * Copyright (c) 2022 Huawei
+ */
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/cxl/cxl.h"
+
+void cxl_fixed_memory_window_config(MachineState *ms,
+CXLFixedMemoryWindowOptions *object,
+Error **errp) {};
+
+void cxl_fixed_memory_window_link_targets(Error **errp) {};
diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
new file mode 100644
index 00..ec5a75cbf5
--- /dev/null
+++ b/hw/cxl/cxl-host.c
@@ -0,0 +1,94 @@
+/*
+ * CXL host parameter parsing routines
+ *
+ * Copyright (c) 2022 Huawei
+ * Modeled loosely on the NUMA options handling in hw/core/numa.c
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qemu/bitmap.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "sysemu/qtest.h"
+#include "hw/boards.h"
+
+#include "qapi/qapi-visit-machine.h"
+#include "hw/cxl/cxl.h"
+
+void cxl_fixed_memory_window_config(MachineState *ms,
+CXLFixedMemoryWindowOptions *object,
+Error **errp)
+{
+CXLFixedWindow *fw = g_malloc0(sizeof(*fw));
+strList *target;
+int i;
+
+for (target = object->targets; target; target = target->next) {
+fw->num_targets++;
+}
+
+fw->enc_int_ways = cxl_interleave_ways_enc(fw->num_targets, errp);
+if (*errp) {
+return;
+}
+
+fw->targets = g_malloc0_n(fw->num_targets, sizeof(*fw->targets));
+for (i = 0, target = object->targets; target; i++, target = target->next) {
+/* This link cannot be resolved yet, so stash the name for now */
+fw->targets[i] = g_strdup(target->value);
+}
+
+if (object->size % (256 * MiB)) {
+error_setg(errp,
+   "Size of a CXL fixed memory window must my a multiple of 
256MiB");
+return;
+}
+fw->size = object->size;
+
+if (object->has_interleave_granularity) {
+fw->enc_int_gran =
+

[PATCH for-7.1 07/18] hw/arm/exynos4210: Move exynos4210_init_board_irqs() into exynos4210.c

2022-04-04 Thread Peter Maydell

The function exynos4210_init_board_irqs() currently lives in
exynos4210_gic.c, but it isn't really part of the exynos4210.gic
device -- it is a function that implements (some of) the wiring up of
interrupts between the SoC's GIC and combiner components.  This means
it fits better in exynos4210.c, which is the SoC-level code.  Move it
there. Similarly, exynos4210_git_irq() is used almost only in the
SoC-level code, so move it too.

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h |   4 -
 hw/arm/exynos4210.c | 202 +++
 hw/intc/exynos4210_gic.c| 204 
 3 files changed, 202 insertions(+), 208 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index a9f186370ee..d83e96a091e 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -111,10 +111,6 @@ OBJECT_DECLARE_SIMPLE_TYPE(Exynos4210State, EXYNOS4210_SOC)
 void exynos4210_write_secondary(ARMCPU *cpu,
 const struct arm_boot_info *info);
 
-/* Initialize board IRQs.
- * These IRQs contain splitted Int/External Combiner and External Gic IRQs */
-void exynos4210_init_board_irqs(Exynos4210State *s);
-
 /* Get IRQ number from exynos4210 IRQ subsystem stub.
  * To identify IRQ source use internal combiner group and bit number
  *  grp - group number
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 11e321d7830..742666ba779 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -101,6 +101,208 @@
 #define EXYNOS4210_PL330_BASE1_ADDR 0x1269
 #define EXYNOS4210_PL330_BASE2_ADDR 0x1285
 
+enum ExtGicId {
+EXT_GIC_ID_MDMA_LCD0 = 66,
+EXT_GIC_ID_PDMA0,
+EXT_GIC_ID_PDMA1,
+EXT_GIC_ID_TIMER0,
+EXT_GIC_ID_TIMER1,
+EXT_GIC_ID_TIMER2,
+EXT_GIC_ID_TIMER3,
+EXT_GIC_ID_TIMER4,
+EXT_GIC_ID_MCT_L0,
+EXT_GIC_ID_WDT,
+EXT_GIC_ID_RTC_ALARM,
+EXT_GIC_ID_RTC_TIC,
+EXT_GIC_ID_GPIO_XB,
+EXT_GIC_ID_GPIO_XA,
+EXT_GIC_ID_MCT_L1,
+EXT_GIC_ID_IEM_APC,
+EXT_GIC_ID_IEM_IEC,
+EXT_GIC_ID_NFC,
+EXT_GIC_ID_UART0,
+EXT_GIC_ID_UART1,
+EXT_GIC_ID_UART2,
+EXT_GIC_ID_UART3,
+EXT_GIC_ID_UART4,
+EXT_GIC_ID_MCT_G0,
+EXT_GIC_ID_I2C0,
+EXT_GIC_ID_I2C1,
+EXT_GIC_ID_I2C2,
+EXT_GIC_ID_I2C3,
+EXT_GIC_ID_I2C4,
+EXT_GIC_ID_I2C5,
+EXT_GIC_ID_I2C6,
+EXT_GIC_ID_I2C7,
+EXT_GIC_ID_SPI0,
+EXT_GIC_ID_SPI1,
+EXT_GIC_ID_SPI2,
+EXT_GIC_ID_MCT_G1,
+EXT_GIC_ID_USB_HOST,
+EXT_GIC_ID_USB_DEVICE,
+EXT_GIC_ID_MODEMIF,
+EXT_GIC_ID_HSMMC0,
+EXT_GIC_ID_HSMMC1,
+EXT_GIC_ID_HSMMC2,
+EXT_GIC_ID_HSMMC3,
+EXT_GIC_ID_SDMMC,
+EXT_GIC_ID_MIPI_CSI_4LANE,
+EXT_GIC_ID_MIPI_DSI_4LANE,
+EXT_GIC_ID_MIPI_CSI_2LANE,
+EXT_GIC_ID_MIPI_DSI_2LANE,
+EXT_GIC_ID_ONENAND_AUDI,
+EXT_GIC_ID_ROTATOR,
+EXT_GIC_ID_FIMC0,
+EXT_GIC_ID_FIMC1,
+EXT_GIC_ID_FIMC2,
+EXT_GIC_ID_FIMC3,
+EXT_GIC_ID_JPEG,
+EXT_GIC_ID_2D,
+EXT_GIC_ID_PCIe,
+EXT_GIC_ID_MIXER,
+EXT_GIC_ID_HDMI,
+EXT_GIC_ID_HDMI_I2C,
+EXT_GIC_ID_MFC,
+EXT_GIC_ID_TVENC,
+};
+
+enum ExtInt {
+EXT_GIC_ID_EXTINT0 = 48,
+EXT_GIC_ID_EXTINT1,
+EXT_GIC_ID_EXTINT2,
+EXT_GIC_ID_EXTINT3,
+EXT_GIC_ID_EXTINT4,
+EXT_GIC_ID_EXTINT5,
+EXT_GIC_ID_EXTINT6,
+EXT_GIC_ID_EXTINT7,
+EXT_GIC_ID_EXTINT8,
+EXT_GIC_ID_EXTINT9,
+EXT_GIC_ID_EXTINT10,
+EXT_GIC_ID_EXTINT11,
+EXT_GIC_ID_EXTINT12,
+EXT_GIC_ID_EXTINT13,
+EXT_GIC_ID_EXTINT14,
+EXT_GIC_ID_EXTINT15
+};
+
+/*
+ * External GIC sources which are not from External Interrupt Combiner or
+ * External Interrupts are starting from EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ,
+ * which is INTG16 in Internal Interrupt Combiner.
+ */
+
+static const uint32_t
+combiner_grp_to_gic_id[64 - EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
+/* int combiner groups 16-19 */
+{ }, { }, { }, { },
+/* int combiner group 20 */
+{ 0, EXT_GIC_ID_MDMA_LCD0 },
+/* int combiner group 21 */
+{ EXT_GIC_ID_PDMA0, EXT_GIC_ID_PDMA1 },
+/* int combiner group 22 */
+{ EXT_GIC_ID_TIMER0, EXT_GIC_ID_TIMER1, EXT_GIC_ID_TIMER2,
+EXT_GIC_ID_TIMER3, EXT_GIC_ID_TIMER4 },
+/* int combiner group 23 */
+{ EXT_GIC_ID_RTC_ALARM, EXT_GIC_ID_RTC_TIC },
+/* int combiner group 24 */
+{ EXT_GIC_ID_GPIO_XB, EXT_GIC_ID_GPIO_XA },
+/* int combiner group 25 */
+{ EXT_GIC_ID_IEM_APC, EXT_GIC_ID_IEM_IEC },
+/* int combiner group 26 */
+{ EXT_GIC_ID_UART0, EXT_GIC_ID_UART1, EXT_GIC_ID_UART2, EXT_GIC_ID_UART3,
+EXT_GIC_ID_UART4 },
+/* int combiner group 27 */
+{ EXT_GIC_ID_I2C0, EXT_GIC_ID_I2C1, EXT_GIC_ID_I2C2, EXT_GIC_ID_I2C3,
+EXT_GIC_ID_I2C4, EXT_GIC_ID_I2C5, EXT_GIC_ID_I2C6,
+EXT_GIC_ID_I2C7 },
+/* int combiner group 28 */
+{ EXT_GIC_ID_SPI0, EXT_GIC_ID_SPI1, EXT_GIC_ID_SPI2 , EXT_GIC_ID_USB_HOST},
+/* int combiner

[PATCH for-7.1 01/18] hw/arm/exynos4210: Use TYPE_OR_IRQ instead of custom OR-gate device

2022-04-04 Thread Peter Maydell

The Exynos4210 SoC device currently uses a custom device
"exynos4210.irq_gate" to model the OR gate that feeds each CPU's IRQ
line.  We have a standard TYPE_OR_IRQ device for this now, so use
that instead.

(This is a migration compatibility break, but that is OK for this
machine type.)

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h |  1 +
 hw/arm/exynos4210.c | 31 ---
 2 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index 60b9e126f55..3999034053e 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -102,6 +102,7 @@ struct Exynos4210State {
 MemoryRegion bootreg_mem;
 I2CBus *i2c_if[EXYNOS4210_I2C_NUMBER];
 qemu_or_irq pl330_irq_orgate[EXYNOS4210_NUM_DMA];
+qemu_or_irq cpu_irq_orgate[EXYNOS4210_NCPUS];
 };
 
 #define TYPE_EXYNOS4210_SOC "exynos4210"
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 0299e81f853..dfc0a4eec25 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -205,7 +205,6 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 {
 Exynos4210State *s = EXYNOS4210_SOC(socdev);
 MemoryRegion *system_mem = get_system_memory();
-qemu_irq gate_irq[EXYNOS4210_NCPUS][EXYNOS4210_IRQ_GATE_NINPUTS];
 SysBusDevice *busdev;
 DeviceState *dev, *uart[4], *pl330[3];
 int i, n;
@@ -235,18 +234,13 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 
 /* IRQ Gate */
 for (i = 0; i < EXYNOS4210_NCPUS; i++) {
-dev = qdev_new("exynos4210.irq_gate");
-qdev_prop_set_uint32(dev, "n_in", EXYNOS4210_IRQ_GATE_NINPUTS);
-sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
-/* Get IRQ Gate input in gate_irq */
-for (n = 0; n < EXYNOS4210_IRQ_GATE_NINPUTS; n++) {
-gate_irq[i][n] = qdev_get_gpio_in(dev, n);
-}
-busdev = SYS_BUS_DEVICE(dev);
-
-/* Connect IRQ Gate output to CPU's IRQ line */
-sysbus_connect_irq(busdev, 0,
-   qdev_get_gpio_in(DEVICE(s->cpu[i]), ARM_CPU_IRQ));
+DeviceState *orgate = DEVICE(>cpu_irq_orgate[i]);
+object_property_set_int(OBJECT(orgate), "num-lines",
+EXYNOS4210_IRQ_GATE_NINPUTS,
+_abort);
+qdev_realize(orgate, NULL, _abort);
+qdev_connect_gpio_out(orgate, 0,
+  qdev_get_gpio_in(DEVICE(s->cpu[i]), 
ARM_CPU_IRQ));
 }
 
 /* Private memory region and Internal GIC */
@@ -256,7 +250,8 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 sysbus_realize_and_unref(busdev, _fatal);
 sysbus_mmio_map(busdev, 0, EXYNOS4210_SMP_PRIVATE_BASE_ADDR);
 for (n = 0; n < EXYNOS4210_NCPUS; n++) {
-sysbus_connect_irq(busdev, n, gate_irq[n][0]);
+sysbus_connect_irq(busdev, n,
+   qdev_get_gpio_in(DEVICE(>cpu_irq_orgate[n]), 0));
 }
 for (n = 0; n < EXYNOS4210_INT_GIC_NIRQ; n++) {
 s->irqs.int_gic_irq[n] = qdev_get_gpio_in(dev, n);
@@ -275,7 +270,8 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 /* Map Distributer interface */
 sysbus_mmio_map(busdev, 1, EXYNOS4210_EXT_GIC_DIST_BASE_ADDR);
 for (n = 0; n < EXYNOS4210_NCPUS; n++) {
-sysbus_connect_irq(busdev, n, gate_irq[n][1]);
+sysbus_connect_irq(busdev, n,
+   qdev_get_gpio_in(DEVICE(>cpu_irq_orgate[n]), 1));
 }
 for (n = 0; n < EXYNOS4210_EXT_GIC_NIRQ; n++) {
 s->irqs.ext_gic_irq[n] = qdev_get_gpio_in(dev, n);
@@ -488,6 +484,11 @@ static void exynos4210_init(Object *obj)
 object_initialize_child(obj, name, orgate, TYPE_OR_IRQ);
 g_free(name);
 }
+
+for (i = 0; i < ARRAY_SIZE(s->cpu_irq_orgate); i++) {
+g_autofree char *name = g_strdup_printf("cpu-irq-orgate%d", i);
+object_initialize_child(obj, name, >cpu_irq_orgate[i], TYPE_OR_IRQ);
+}
 }
 
 static void exynos4210_class_init(ObjectClass *klass, void *data)
-- 
2.25.1

Re: [PATCH] [PATCH RFC v2] Implements Backend Program conventions for vhost-user-scsi

2022-04-04 Thread Stefan Hajnoczi

On Mon, 4 Apr 2022 at 15:51, Sakshi Kaushik  wrote:
> I am not able to find vhost-user-scsi inside build/contrib/vhost-user-scsi 
> despite running the 'make' command.

It is probably not being built because the dependencies are not
installed on your machine. Here are the contents of the
contrib/vhost-user-scsi/meson.build file:

  if libiscsi.found()
executable('vhost-user-scsi', files('vhost-user-scsi.c'),
   dependencies: [qemuutil, libiscsi, vhost_user],
   build_by_default: targetos == 'linux',
   install: false)
  endif

The build machine must be a Linux machine and it must have the
libiscsi-dev (Debian/Ubuntu), libiscsi-devel (Fedora/CentOS/RHEL), or
similarly-named package installed. You can run QEMU's ./configure
script and look at the output to see if it detected libiscsi.

Stefan

[PATCH v9 24/45] acpi/cxl: Add _OSC implementation (9.14.2)

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

CXL 2.0 specification adds 2 new dwords to the existing _OSC definition
from PCIe. The new dwords are accessed with a new uuid. This
implementation supports what is in the specification.

iasl -d decodes the result of this patch as:

Name (SUPP, Zero)
Name (CTRL, Zero)
Name (SUPC, Zero)
Name (CTRC, Zero)
Method (_OSC, 4, NotSerialized)  // _OSC: Operating System Capabilities
{
CreateDWordField (Arg3, Zero, CDW1)
If (((Arg0 == ToUUID ("33db4d5b-1ff7-401c-9657-7441c03dd766") /* PCI Host 
Bridge Device */) || (Arg0 == ToUUID ("68f2d50b-c469-4d8a-bd3d-941a103fd3fc") 
/* Unknown UUID */)))
{
CreateDWordField (Arg3, 0x04, CDW2)
CreateDWordField (Arg3, 0x08, CDW3)
Local0 = CDW3 /* \_SB_.PC0C._OSC.CDW3 */
Local0 &= 0x1F
If ((Arg1 != One))
{
CDW1 |= 0x08
}

If ((CDW3 != Local0))
{
CDW1 |= 0x10
}

SUPP = CDW2 /* \_SB_.PC0C._OSC.CDW2 */
CTRL = CDW3 /* \_SB_.PC0C._OSC.CDW3 */
CDW3 = Local0
If ((Arg0 == ToUUID ("68f2d50b-c469-4d8a-bd3d-941a103fd3fc") /* Unknown 
UUID */))
{
CreateDWordField (Arg3, 0x0C, CDW4)
CreateDWordField (Arg3, 0x10, CDW5)
SUPC = CDW4 /* \_SB_.PC0C._OSC.CDW4 */
CTRC = CDW5 /* \_SB_.PC0C._OSC.CDW5 */
CDW5 |= One
}

Return (Arg3)
}
Else
{
CDW1 |= 0x04
Return (Arg3)
}

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 hw/acpi/Kconfig   |   5 ++
 hw/acpi/cxl-stub.c|  12 
 hw/acpi/cxl.c | 130 ++
 hw/acpi/meson.build   |   4 +-
 hw/i386/acpi-build.c  |  15 +++--
 include/hw/acpi/cxl.h |  23 
 6 files changed, 183 insertions(+), 6 deletions(-)

diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 19caebde6c..3703aca212 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -5,6 +5,7 @@ config ACPI_X86
 bool
 select ACPI
 select ACPI_NVDIMM
+select ACPI_CXL
 select ACPI_CPU_HOTPLUG
 select ACPI_MEMORY_HOTPLUG
 select ACPI_HMAT
@@ -66,3 +67,7 @@ config ACPI_ERST
 bool
 default y
 depends on ACPI && PCI
+
+config ACPI_CXL
+bool
+depends on ACPI
diff --git a/hw/acpi/cxl-stub.c b/hw/acpi/cxl-stub.c
new file mode 100644
index 00..15bc21076b
--- /dev/null
+++ b/hw/acpi/cxl-stub.c
@@ -0,0 +1,12 @@
+
+/*
+ * Stubs for ACPI platforms that don't support CXl
+ */
+#include "qemu/osdep.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/cxl.h"
+
+void build_cxl_osc_method(Aml *dev)
+{
+g_assert_not_reached();
+}
diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
new file mode 100644
index 00..ca1f04f359
--- /dev/null
+++ b/hw/acpi/cxl.c
@@ -0,0 +1,130 @@
+/*
+ * CXL ACPI Implementation
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "qemu/osdep.h"
+#include "hw/cxl/cxl.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "hw/acpi/cxl.h"
+#include "qapi/error.h"
+#include "qemu/uuid.h"
+
+static Aml *__build_cxl_osc_method(void)
+{
+Aml *method, *if_uuid, *else_uuid, *if_arg1_not_1, *if_cxl, 
*if_caps_masked;
+Aml *a_ctrl = aml_local(0);
+Aml *a_cdw1 = aml_name("CDW1");
+
+method = aml_method("_OSC", 4, AML_NOTSERIALIZED);
+/* CDW1 is used for the return value so is present whether or not a match 
occurs */
+aml_append(method, aml_create_dword_field(aml_arg(3), aml_int(0), "CDW1"));
+
+/*
+ * Generate shared section between:
+ * CXL 2.0 - 9.14.2.1.4 and
+ * PCI Firmware Specification 3.0
+ * 4.5.1. _OSC Interface for PCI Host Bridge Devices
+ * The _OSC interface for a PCI/PCI-X/PCI Express hierarchy is
+ * identified by the Universal Unique IDentifier (UUID)
+ * 33DB4D5B-1FF7-401C-9657-7441C03DD766
+ * The _OSC interface for a CXL Host bridge is
+ * identified by the UUID 68F2D50B-C469-4D8A-BD3D-941A103FD3FC
+ * A CXL Host bridge is compatible with a PCI host bridge so
+ * for the shared section match both.
+ */
+if_uuid = aml_if(
+aml_lor(aml_equal(aml_arg(0),
+  aml_touuid("33DB4D5B-1FF7-401C-9657-7441C03DD766")),
+

[PATCH v9 44/45] pci-bridge/cxl_downstream: Add a CXL switch downstream port

2022-04-04 Thread Jonathan Cameron via

Emulation of a simple CXL Switch downstream port.
The Device ID has been allocated for this use.

Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-host.c  |  43 +-
 hw/pci-bridge/cxl_downstream.c | 244 +
 hw/pci-bridge/meson.build  |   2 +-
 3 files changed, 286 insertions(+), 3 deletions(-)

diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
index 469b3c4ced..317f5a37ca 100644
--- a/hw/cxl/cxl-host.c
+++ b/hw/cxl/cxl-host.c
@@ -130,8 +130,9 @@ static bool cxl_hdm_find_target(uint32_t *cache_mem, hwaddr 
addr,
 
 static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr)
 {
-CXLComponentState *hb_cstate;
+CXLComponentState *hb_cstate, *usp_cstate;
 PCIHostState *hb;
+CXLUpstreamPort *usp;
 int rb_index;
 uint32_t *cache_mem;
 uint8_t target;
@@ -165,8 +166,46 @@ static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow 
*fw, hwaddr addr)
 }
 
 d = pci_bridge_get_sec_bus(PCI_BRIDGE(rp))->devices[0];
+if (!d) {
+return NULL;
+}
+
+if (object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3)) {
+return d;
+}
+
+/*
+ * Could also be a switch.  Note only one level of switching currently
+ * supported.
+ */
+if (!object_dynamic_cast(OBJECT(d), TYPE_CXL_USP)) {
+return NULL;
+}
+usp = CXL_USP(d);
+
+usp_cstate = cxl_usp_to_cstate(usp);
+if (!usp_cstate) {
+return NULL;
+}
+
+cache_mem = usp_cstate->crb.cache_mem_registers;
+
+target_found = cxl_hdm_find_target(cache_mem, addr, );
+if (!target_found) {
+return NULL;
+}
+
+d = pcie_find_port_by_pn(_BRIDGE(d)->sec_bus, target);
+if (!d) {
+return NULL;
+}
+
+d = pci_bridge_get_sec_bus(PCI_BRIDGE(d))->devices[0];
+if (!d) {
+return NULL;
+}
 
-if (!d || !object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3)) {
+if (!object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3)) {
 return NULL;
 }
 
diff --git a/hw/pci-bridge/cxl_downstream.c b/hw/pci-bridge/cxl_downstream.c
new file mode 100644
index 00..641593203e
--- /dev/null
+++ b/hw/pci-bridge/cxl_downstream.c
@@ -0,0 +1,244 @@
+/*
+ * Emulated CXL Switch Downstream Port
+ *
+ * Copyright (c) 2022 Huawei Technologies.
+ *
+ * Based on xio31130_downstream.c
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/pcie.h"
+#include "hw/pci/pcie_port.h"
+#include "qapi/error.h"
+
+typedef struct CXLDownStreamPort {
+/*< private >*/
+PCIESlot parent_obj;
+
+/*< public >*/
+CXLComponentState cxl_cstate;
+} CXLDownstreamPort;
+
+#define TYPE_CXL_DSP "cxl-downstream"
+DECLARE_INSTANCE_CHECKER(CXLDownstreamPort, CXL_DSP, TYPE_CXL_DSP)
+
+#define CXL_DOWNSTREAM_PORT_MSI_OFFSET 0x70
+#define CXL_DOWNSTREAM_PORT_MSI_NR_VECTOR 1
+#define CXL_DOWNSTREAM_PORT_EXP_OFFSET 0x90
+#define CXL_DOWNSTREAM_PORT_AER_OFFSET 0x100
+#define CXL_DOWNSTREAM_PORT_DVSEC_OFFSET\
+(CXL_DOWNSTREAM_PORT_AER_OFFSET + PCI_ERR_SIZEOF)
+
+static void latch_registers(CXLDownstreamPort *dsp)
+{
+uint32_t *reg_state = dsp->cxl_cstate.crb.cache_mem_registers;
+
+cxl_component_register_init_common(reg_state, CXL2_DOWNSTREAM_PORT);
+}
+
+/* TODO: Look at sharing this code acorss all CXL port types */
+static void cxl_dsp_dvsec_write_config(PCIDevice *dev, uint32_t addr,
+  uint32_t val, int len)
+{
+CXLDownstreamPort *dsp = CXL_DSP(dev);
+CXLComponentState *cxl_cstate = >cxl_cstate;
+
+if (range_contains(_cstate->dvsecs[EXTENSIONS_PORT_DVSEC], addr)) {
+uint8_t *reg = >config[addr];
+addr -= cxl_cstate->dvsecs[EXTENSIONS_PORT_DVSEC].lob;
+if (addr == PORT_CONTROL_OFFSET) {
+if (pci_get_word(reg) & PORT_CONTROL_UNMASK_SBR) {
+/* unmask SBR */
+qemu_log_mask(LOG_UNIMP, "SBR mask control is not 
supported\n");
+}
+if (pci_get_word(reg) & PORT_CONTROL_ALT_MEMID_EN) {
+/* Alt Memory & ID Space Enable */
+qemu_log_mask(LOG_UNIMP,
+  "Alt Memory & ID space is not supported\n");
+
+}
+}
+}
+}
+
+static void cxl_dsp_config_write(PCIDevice *d, uint32_t address,
+ uint32_t val, int len)
+{
+uint16_t slt_ctl, slt_sta;
+
+pcie_cap_slot_get(d, _ctl, _sta);
+pci_bridge_write_config(d, address, val, len);
+pcie_cap_flr_write_config(d, address, val, len);
+pcie_cap_slot_write_config(d, slt_ctl, slt_sta, address, val, len);
+pcie_aer_write_config(d, address, val, len);
+
+cxl_dsp_dvsec_write_config(d, address, val, len);
+}
+
+static void cxl_dsp_reset(DeviceState *qdev)
+{
+PCIDevice *d = PCI_DEVICE(qdev);
+CXLDownstreamPort *dsp = CXL_DSP(qdev);
+
+pcie_cap_deverr_reset(d);
+pcie_cap_slot_reset(d);
+

[PATCH for-7.1 02/18] hw/intc/exynos4210_gic: Remove unused TYPE_EXYNOS4210_IRQ_GATE

2022-04-04 Thread Peter Maydell

Now we have removed the only use of TYPE_EXYNOS4210_IRQ_GATE we can
delete the device entirely.

Signed-off-by: Peter Maydell 
---
 hw/intc/exynos4210_gic.c | 107 ---
 1 file changed, 107 deletions(-)

diff --git a/hw/intc/exynos4210_gic.c b/hw/intc/exynos4210_gic.c
index bc73d1f1152..794f6b5ac72 100644
--- a/hw/intc/exynos4210_gic.c
+++ b/hw/intc/exynos4210_gic.c
@@ -373,110 +373,3 @@ static void exynos4210_gic_register_types(void)
 }
 
 type_init(exynos4210_gic_register_types)
-
-/* IRQ OR Gate struct.
- *
- * This device models an OR gate. There are n_in input qdev gpio lines and one
- * output sysbus IRQ line. The output IRQ level is formed as OR between all
- * gpio inputs.
- */
-
-#define TYPE_EXYNOS4210_IRQ_GATE "exynos4210.irq_gate"
-OBJECT_DECLARE_SIMPLE_TYPE(Exynos4210IRQGateState, EXYNOS4210_IRQ_GATE)
-
-struct Exynos4210IRQGateState {
-SysBusDevice parent_obj;
-
-uint32_t n_in;  /* inputs amount */
-uint32_t *level;/* input levels */
-qemu_irq out;   /* output IRQ */
-};
-
-static Property exynos4210_irq_gate_properties[] = {
-DEFINE_PROP_UINT32("n_in", Exynos4210IRQGateState, n_in, 1),
-DEFINE_PROP_END_OF_LIST(),
-};
-
-static const VMStateDescription vmstate_exynos4210_irq_gate = {
-.name = "exynos4210.irq_gate",
-.version_id = 2,
-.minimum_version_id = 2,
-.fields = (VMStateField[]) {
-VMSTATE_VBUFFER_UINT32(level, Exynos4210IRQGateState, 1, NULL, n_in),
-VMSTATE_END_OF_LIST()
-}
-};
-
-/* Process a change in IRQ input. */
-static void exynos4210_irq_gate_handler(void *opaque, int irq, int level)
-{
-Exynos4210IRQGateState *s = (Exynos4210IRQGateState *)opaque;
-uint32_t i;
-
-assert(irq < s->n_in);
-
-s->level[irq] = level;
-
-for (i = 0; i < s->n_in; i++) {
-if (s->level[i] >= 1) {
-qemu_irq_raise(s->out);
-return;
-}
-}
-
-qemu_irq_lower(s->out);
-}
-
-static void exynos4210_irq_gate_reset(DeviceState *d)
-{
-Exynos4210IRQGateState *s = EXYNOS4210_IRQ_GATE(d);
-
-memset(s->level, 0, s->n_in * sizeof(*s->level));
-}
-
-/*
- * IRQ Gate initialization.
- */
-static void exynos4210_irq_gate_init(Object *obj)
-{
-Exynos4210IRQGateState *s = EXYNOS4210_IRQ_GATE(obj);
-SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
-
-sysbus_init_irq(sbd, >out);
-}
-
-static void exynos4210_irq_gate_realize(DeviceState *dev, Error **errp)
-{
-Exynos4210IRQGateState *s = EXYNOS4210_IRQ_GATE(dev);
-
-/* Allocate general purpose input signals and connect a handler to each of
- * them */
-qdev_init_gpio_in(dev, exynos4210_irq_gate_handler, s->n_in);
-
-s->level = g_malloc0(s->n_in * sizeof(*s->level));
-}
-
-static void exynos4210_irq_gate_class_init(ObjectClass *klass, void *data)
-{
-DeviceClass *dc = DEVICE_CLASS(klass);
-
-dc->reset = exynos4210_irq_gate_reset;
-dc->vmsd = _exynos4210_irq_gate;
-device_class_set_props(dc, exynos4210_irq_gate_properties);
-dc->realize = exynos4210_irq_gate_realize;
-}
-
-static const TypeInfo exynos4210_irq_gate_info = {
-.name  = TYPE_EXYNOS4210_IRQ_GATE,
-.parent= TYPE_SYS_BUS_DEVICE,
-.instance_size = sizeof(Exynos4210IRQGateState),
-.instance_init = exynos4210_irq_gate_init,
-.class_init= exynos4210_irq_gate_class_init,
-};
-
-static void exynos4210_irq_gate_register_types(void)
-{
-type_register_static(_irq_gate_info);
-}
-
-type_init(exynos4210_irq_gate_register_types)
-- 
2.25.1

[PATCH v9 45/45] docs/cxl: Add switch documentation

2022-04-04 Thread Jonathan Cameron via

Switches were already introduced, but now we support them update
the documentation to provide an example in diagram and
qemu command line parameter forms.

Signed-off-by: Jonathan Cameron 
---
 docs/system/devices/cxl.rst | 88 -
 1 file changed, 86 insertions(+), 2 deletions(-)

diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
index 9293cbf01a..abf7c1f243 100644
--- a/docs/system/devices/cxl.rst
+++ b/docs/system/devices/cxl.rst
@@ -118,8 +118,6 @@ and associated component register access via PCI bars.
 
 CXL Switch
 ~~
-Not yet implemented in QEMU.
-
 Here we consider a simple CXL switch with only a single
 virtual hierarchy. Whilst more complex devices exist, their
 visibility to a particular host is generally the same as for
@@ -137,6 +135,10 @@ BARs.  The Upstream Port has the configuration interfaces 
for
 the HDM decoders which route incoming memory accesses to the
 appropriate downstream port.
 
+A CXL switch is created in a similar fashion to PCI switches
+by creating an upstream port (cxl-upstream) and a number of
+downstream ports on the internal switch bus (cxl-downstream).
+
 CXL Memory Devices - Type 3
 ~~~
 CXL type 3 devices use a PCI class code and are intended to be supported
@@ -240,6 +242,62 @@ Notes:
 they will take the Host Physical Addresses of accesses and map
 them to their own local Device Physical Address Space (DPA).
 
+Example topology involving a switch::
+
+  |<--SYSTEM PHYSICAL ADDRESS MAP (1)->|
+  |__   __   __|
+  |   |  | |  | |  |   |
+  |   | CFMW 0   | |  CXL Fixed Memory Window 1   | | CFMW 1   |   |
+  |   | HB0 only | |  Configured to interleave memory | | HB1 only |   |
+  |   |  | |  memory accesses across HB0/HB1  | |  |   |
+  |   |x_| |__| |__|   |
+   | | | |
+   | | | |
+   | | |
+  Interleave Decoder | | |
+   Matches this HB   | | |
+   \_| |_/
+   __|__  _|___
+  | || |
+  | CXL HB 0|| CXL HB 1|
+  | HB IntLv Decoders   || HB IntLv Decoders   |
+  | PCI/CXL Root Bus 0c || PCI/CXL Root Bus 0d |
+  | || |
+  |___x_||_|
+  |  |  |   |
+  |
+   A HB 0 HDM Decoder
+   matches this Port
+   ___|___
+  |  Root Port 0  |
+  |  Appears in   |
+  |  PCI topology |
+  |  As 0c:00.0   |
+  |___x___|
+  |
+  |
+  \_
+|
+|
+---
+   |Switch 0  USP as PCI 0d:00.0   |
+   |USP has HDM decoder which direct traffic to|
+   |appropiate downstream port |
+   |Switch BUS appears as 0e   |
+   |x__|
+|  |   |  |
+|  |   |  |
+   _|_   __|__   __|_   __|___
+   (4)| x | | | || |  |
+  | CXL Type3 0   | | CXL Type3 1 | | CXL type3 2| | CLX Type 3 3 |
+  |   | | | || |  |
+  | PMEM0(Vol LSA)| | PMEM1 (...) | | PMEM2 (...)| | PMEM3 (...)  |
+  | Decoder to go | | | || |  |
+  | from host PA  | | PCI 10:00.0 | | PCI 11:00.0| | PCI 12:00.0  |
+  | to device PA  | | | || |  |
+  | PCI as 0f:00.0| | | || |  |
+  |___| |_| || |__|
+
 Example command lines
 -
 A very simple setup with just one directly attached CXL Type 3 device::
@@ -279,6 +337,32 @@ the CXL Type3 device directly attached (no switches).::
   -device cxl-type3,bus=root_port16,memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3 \
   -cxl-fixed-memory-window 
targets.0=cxl.1,targets.1=cxl.2,size=4G,interleave-granularity=8k
 
+An example of 4 devices below a switch suitable for 1,

[PATCH v9 21/45] hw/cxl/device: Implement get/set Label Storage Area (LSA)

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

Implement get and set handlers for the Label Storage Area
used to hold data describing persistent memory configuration
so that it can be ensured it is seen in the same configuration
after reboot.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-mailbox-utils.c  | 60 +
 hw/mem/cxl_type3.c  | 56 +-
 include/hw/cxl/cxl_device.h |  5 
 3 files changed, 120 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 492739aef3..bb66c765a5 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -57,6 +57,8 @@ enum {
 #define MEMORY_DEVICE 0x0
 CCLS= 0x41,
 #define GET_PARTITION_INFO 0x0
+#define GET_LSA   0x2
+#define SET_LSA   0x3
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -326,7 +328,62 @@ static ret_code cmd_ccls_get_partition_info(struct cxl_cmd 
*cmd,
 return CXL_MBOX_SUCCESS;
 }
 
+static ret_code cmd_ccls_get_lsa(struct cxl_cmd *cmd,
+ CXLDeviceState *cxl_dstate,
+ uint16_t *len)
+{
+struct {
+uint32_t offset;
+uint32_t length;
+} QEMU_PACKED *get_lsa;
+CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
+uint32_t offset, length;
+
+get_lsa = (void *)cmd->payload;
+offset = get_lsa->offset;
+length = get_lsa->length;
+
+if (offset + length > cvc->get_lsa_size(ct3d)) {
+*len = 0;
+return CXL_MBOX_INVALID_INPUT;
+}
+
+*len = cvc->get_lsa(ct3d, get_lsa, length, offset);
+return CXL_MBOX_SUCCESS;
+}
+
+static ret_code cmd_ccls_set_lsa(struct cxl_cmd *cmd,
+ CXLDeviceState *cxl_dstate,
+ uint16_t *len)
+{
+struct set_lsa_pl {
+uint32_t offset;
+uint32_t rsvd;
+uint8_t data[];
+} QEMU_PACKED;
+struct set_lsa_pl *set_lsa_payload = (void *)cmd->payload;
+CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
+const size_t hdr_len = offsetof(struct set_lsa_pl, data);
+uint16_t plen = *len;
+
+*len = 0;
+if (!plen) {
+return CXL_MBOX_SUCCESS;
+}
+
+if (set_lsa_payload->offset + plen > cvc->get_lsa_size(ct3d) + hdr_len) {
+return CXL_MBOX_INVALID_INPUT;
+}
+plen -= hdr_len;
+
+cvc->set_lsa(ct3d, set_lsa_payload->data, plen, set_lsa_payload->offset);
+return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
+#define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
 
@@ -349,6 +406,9 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_identify_memory_device, 0, 0 },
 [CCLS][GET_PARTITION_INFO] = { "CCLS_GET_PARTITION_INFO",
 cmd_ccls_get_partition_info, 0, 0 },
+[CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 0, 0 },
+[CCLS][SET_LSA] = { "CCLS_SET_LSA", cmd_ccls_set_lsa,
+~0, IMMEDIATE_CONFIG_CHANGE | IMMEDIATE_DATA_CHANGE },
 };
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 14d8b0c503..9578e72576 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -8,6 +8,7 @@
 #include "qapi/error.h"
 #include "qemu/log.h"
 #include "qemu/module.h"
+#include "qemu/pmem.h"
 #include "qemu/range.h"
 #include "qemu/rcu.h"
 #include "sysemu/hostmem.h"
@@ -111,6 +112,11 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error 
**errp)
 host_memory_backend_set_mapped(ct3d->hostmem, true);
 ct3d->cxl_dstate.pmem_size = ct3d->hostmem->size;
 
+if (!ct3d->lsa) {
+error_setg(errp, "lsa property must be set");
+return false;
+}
+
 return true;
 }
 
@@ -173,12 +179,58 @@ static void ct3d_reset(DeviceState *dev)
 static Property ct3_props[] = {
 DEFINE_PROP_LINK("memdev", CXLType3Dev, hostmem, TYPE_MEMORY_BACKEND,
  HostMemoryBackend *),
+DEFINE_PROP_LINK("lsa", CXLType3Dev, lsa, TYPE_MEMORY_BACKEND,
+ HostMemoryBackend *),
 DEFINE_PROP_END_OF_LIST(),
 };
 
 static uint64_t get_lsa_size(CXLType3Dev *ct3d)
 {
-return 0;
+MemoryRegion *mr;
+
+mr = host_memory_backend_get_memory(ct3d->lsa);
+return memory_region_size(mr);
+}
+
+static void validate_lsa_access(MemoryRegion *mr, uint64_t size,
+uint64_t offset)
+{
+assert(offset + size <= memory_region_size(mr));
+assert(offset + size > offset);
+}
+
+static uint64_t get_lsa(CXLType3Dev *ct3d, void *buf, uint64_t size,
+uint64_t offset)
+{
+MemoryRegion *mr;
+void *lsa;
+
+mr = host_memory_backend_get_memory(ct3d->lsa);
+

[PATCH v9 42/45] docs/cxl: Add initial Compute eXpress Link (CXL) documentation.

2022-04-04 Thread Jonathan Cameron via

Provide an introduction to the main components of a CXL system,
with detailed explanation of memory interleaving, example command
lines and kernel configuration.

This was a challenging document to write due to the need to extract
only that subset of CXL information which is relevant to either
users of QEMU emulation of CXL or to those interested in the
implementation.  Much of CXL is concerned with specific elements of
the protocol, management of memory pooling etc which is simply
not relevant to what is currently planned for CXL emulation
in QEMU.  All comments welcome

Signed-off-by: Jonathan Cameron 
---
 docs/system/device-emulation.rst |   1 +
 docs/system/devices/cxl.rst  | 302 +++
 2 files changed, 303 insertions(+)

diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst
index 0b3a3d73ad..2da2bd5d64 100644
--- a/docs/system/device-emulation.rst
+++ b/docs/system/device-emulation.rst
@@ -83,6 +83,7 @@ Emulated Devices
:maxdepth: 1
 
devices/can.rst
+   devices/cxl.rst
devices/ivshmem.rst
devices/net.rst
devices/nvme.rst
diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
new file mode 100644
index 00..9293cbf01a
--- /dev/null
+++ b/docs/system/devices/cxl.rst
@@ -0,0 +1,302 @@
+Compute Express Link (CXL)
+==
+From the view of a single host, CXL is an interconnect standard that
+targets accelerators and memory devices attached to a CXL host.
+This description will focus on those aspects visible either to
+software running on a QEMU emulated host or to the internals of
+functional emulation. As such, it will skip over many of the
+electrical and protocol elements that would be more of interest
+for real hardware and will dominate more general introductions to CXL.
+It will also completely ignore the fabric management aspects of CXL
+by considering only a single host and a static configuration.
+
+CXL shares many concepts and much of the infrastructure of PCI Express,
+with CXL Host Bridges, which have CXL Root Ports which may be directly
+attached to CXL or PCI End Points. Alternatively there may be CXL Switches
+with CXL and PCI Endpoints attached below them.  In many cases additional
+control and capabilities are exposed via PCI Express interfaces.
+This sharing of interfaces and hence emulation code is is reflected
+in how the devices are emulated in QEMU. In most cases the various
+CXL elements are built upon an equivalent PCIe devices.
+
+CXL devices support the following interfaces:
+
+* Most conventional PCIe interfaces
+
+  - Configuration space access
+  - BAR mapped memory accesses used for registers and mailboxes.
+  - MSI/MSI-X
+  - AER
+  - DOE mailboxes
+  - IDE
+  - Many other PCI express defined interfaces..
+
+* Memory operations
+
+  - Equivalent of accessing DRAM / NVDIMMs. Any access / feature
+supported by the host for normal memory should also work for
+CXL attached memory devices.
+
+* Cache operations. The are mostly irrelevant to QEMU emulation as
+  QEMU is not emulating a coherency protocol. Any emulation related
+  to these will be device specific and is out of the scope of this
+  document.
+
+CXL 2.0 Device Types
+
+CXL 2.0 End Points are often categorized into three types.
+
+**Type 1:** These support coherent caching of host memory.  Example might
+be a crypto accelerators.  May also have device private memory accessible
+via means such as PCI memory reads and writes to BARs.
+
+**Type 2:** These support coherent caching of host memory and host
+managed device memory (HDM) for which the coherency protocol is managed
+by the host. This is a complex topic, so for more information on CXL
+coherency see the CXL 2.0 specification.
+
+**Type 3 Memory devices:**  These devices act as a means of attaching
+additional memory (HDM) to a CXL host including both volatile and
+persistent memory. The CXL topology may support interleaving across a
+number of Type 3 memory devices using HDM Decoders in the host, host
+bridge, switch upstream port and endpoints.
+
+Scope of CXL emulation in QEMU
+--
+The focus of CXL emulation is CXL revision 2.0 and later. Earlier CXL
+revisions defined a smaller set of features, leaving much of the control
+interface as implementation defined or device specific, making generic
+emulation challenging with host specific firmware being responsible
+for setup and the Endpoints being presented to operating systems
+as Root Complex Integrated End Points. CXL rev 2.0 looks a lot
+more like PCI Express, with fully specified discoverability
+of the CXL topology.
+
+CXL System components
+--
+A CXL system is made up a Host with a number of 'standard components'
+the control and capabilities of which are discoverable by system software
+using means described in the CXL 2.0 specification.
+
+CXL Fixed Memory Windows (CFMW)
+~~~
+A

[PATCH v9 39/45] qtest/cxl: Add more complex test cases with CFMWs

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

Add CXL Fixed Memory Windows to the CXL tests.

Signed-off-by: Ben Widawsky 
Co-developed-by: Jonathan Cameron 
Signed-off-by: Jonathan Cameron 
---
 tests/qtest/cxl-test.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
index 5f0794e816..079011af6a 100644
--- a/tests/qtest/cxl-test.c
+++ b/tests/qtest/cxl-test.c
@@ -9,11 +9,13 @@
 #include "libqtest-single.h"
 
 #define QEMU_PXB_CMD "-machine q35,cxl=on " \
- "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 "
+ "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 "  \
+ "-cxl-fixed-memory-window targets.0=cxl.0,size=4G "
 
-#define QEMU_2PXB_CMD "-machine q35,cxl=on " \
+#define QEMU_2PXB_CMD "-machine q35,cxl=on "\
   "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 "  \
-  "-device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=53 "
+  "-device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=53 " \
+  "-cxl-fixed-memory-window 
targets.0=cxl.0,targets.1=cxl.1,size=4G "
 
 #define QEMU_RP "-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 "
 
-- 
2.32.0

[PATCH v9 40/45] hw/arm/virt: Basic CXL enablement on pci_expander_bridge instances pxb-cxl

2022-04-04 Thread Jonathan Cameron via

Code based on i386/pc enablement.
The memory layout places space for 16 host bridge register regions after
the GIC_REDIST2 in the extended memmap.
The CFMWs are placed above the extended memmap.

Only create the CEDT table if cxl=on set for the machine.

Signed-off-by: Jonathan Cameron 
Signed-off-by: Ben Widawsky 
---
 hw/arm/virt-acpi-build.c | 33 +
 hw/arm/virt.c| 40 +++-
 include/hw/arm/virt.h|  1 +
 3 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 449fab0080..86a2f40437 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -39,9 +39,11 @@
 #include "hw/acpi/aml-build.h"
 #include "hw/acpi/utils.h"
 #include "hw/acpi/pci.h"
+#include "hw/acpi/cxl.h"
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/generic_event_device.h"
 #include "hw/acpi/tpm.h"
+#include "hw/cxl/cxl.h"
 #include "hw/pci/pcie_host.h"
 #include "hw/pci/pci.h"
 #include "hw/pci/pci_bus.h"
@@ -157,10 +159,29 @@ static void acpi_dsdt_add_virtio(Aml *scope,
 }
 }
 
+/* Uses local definition of AcpiBuildState so can't easily be common code */
+static void build_acpi0017(Aml *table)
+{
+Aml *dev, *scope, *method;
+
+scope =  aml_scope("_SB");
+dev = aml_device("CXLM");
+aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0017")));
+
+method = aml_method("_STA", 0, AML_NOTSERIALIZED);
+aml_append(method, aml_return(aml_int(0x01)));
+aml_append(dev, method);
+
+aml_append(scope, dev);
+aml_append(table, scope);
+}
+
 static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
   uint32_t irq, VirtMachineState *vms)
 {
 int ecam_id = VIRT_ECAM_ID(vms->highmem_ecam);
+bool cxl_present = false;
+PCIBus *bus = vms->bus;
 struct GPEXConfig cfg = {
 .mmio32 = memmap[VIRT_PCIE_MMIO],
 .pio= memmap[VIRT_PCIE_PIO],
@@ -174,6 +195,14 @@ static void acpi_dsdt_add_pci(Aml *scope, const 
MemMapEntry *memmap,
 }
 
 acpi_dsdt_add_gpex(scope, );
+QLIST_FOREACH(bus, >bus->child, sibling) {
+if (pci_bus_is_cxl(bus)) {
+cxl_present = true;
+}
+}
+if (cxl_present) {
+build_acpi0017(scope);
+}
 }
 
 static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
@@ -991,6 +1020,10 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
vms->oem_table_id);
 }
 }
+if (ms->cxl_devices_state->is_enabled) {
+cxl_build_cedt(ms, table_offsets, tables_blob, tables->linker,
+   vms->oem_id, vms->oem_table_id);
+}
 
 if (ms->nvdimms_state->is_enabled) {
 nvdimm_build_acpi(table_offsets, tables_blob, tables->linker,
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 9969645c0b..9f81b166c0 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -78,6 +78,7 @@
 #include "hw/virtio/virtio-mem-pci.h"
 #include "hw/virtio/virtio-iommu.h"
 #include "hw/char/pl011.h"
+#include "hw/cxl/cxl.h"
 #include "qemu/guest-random.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
@@ -178,6 +179,7 @@ static const MemMapEntry base_memmap[] = {
 static MemMapEntry extended_memmap[] = {
 /* Additional 64 MB redist region (can contain up to 512 redistributors) */
 [VIRT_HIGH_GIC_REDIST2] =   { 0x0, 64 * MiB },
+[VIRT_CXL_HOST] =   { 0x0, 64 * KiB * 16 }, /* 16 UID */
 [VIRT_HIGH_PCIE_ECAM] = { 0x0, 256 * MiB },
 /* Second PCIe window */
 [VIRT_HIGH_PCIE_MMIO] = { 0x0, 512 * GiB },
@@ -1508,6 +1510,17 @@ static void create_pcie(VirtMachineState *vms)
 }
 }
 
+static void create_cxl_host_reg_region(VirtMachineState *vms)
+{
+MemoryRegion *sysmem = get_system_memory();
+MachineState *ms = MACHINE(vms);
+MemoryRegion *mr = >cxl_devices_state->host_mr;
+
+memory_region_init(mr, OBJECT(ms), "cxl_host_reg",
+   vms->memmap[VIRT_CXL_HOST].size);
+memory_region_add_subregion(sysmem, vms->memmap[VIRT_CXL_HOST].base, mr);
+}
+
 static void create_platform_bus(VirtMachineState *vms)
 {
 DeviceState *dev;
@@ -1670,7 +1683,7 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
 {
 MachineState *ms = MACHINE(vms);
-hwaddr base, device_memory_base, device_memory_size, memtop;
+hwaddr base, device_memory_base, device_memory_size, memtop, cxl_fmw_base;
 int i;
 
 vms->memmap = extended_memmap;
@@ -1762,6 +1775,20 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 memory_region_init(>device_memory->mr, OBJECT(vms),
"device-memory", device_memory_size);
 }
+
+if (ms->cxl_devices_state->fixed_windows) {
+GList *it;
+
+cxl_fmw_base = ROUND_UP(base, 256 * MiB);
+for (it =

[PATCH v9 38/45] tests/acpi: Add tables for CXL emulation.

2022-04-04 Thread Jonathan Cameron via

Tables that differ from normal Q35 tables when running the CXL test.

Signed-off-by: Jonathan Cameron 
---
 tests/data/acpi/q35/CEDT.cxl| Bin 0 -> 184 bytes
 tests/data/acpi/q35/DSDT.cxl| Bin 0 -> 9615 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   2 --
 3 files changed, 2 deletions(-)

diff --git a/tests/data/acpi/q35/CEDT.cxl b/tests/data/acpi/q35/CEDT.cxl
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..b8fa06b00e65712e91e0a5ea0d9277e0146d1c00
 100644
GIT binary patch
literal 184
zcmZ>EbqU$Qz`(%x(aGQ0BUr*U{GMV2P7eE5T6mshKVRJ@Sw=U
r)I#JL88kqeKtKSd14gp~1^Iy(qF)E31_T6{AT-z>kXmGQAh!SjnYIc6

literal 0
HcmV?d1

diff --git a/tests/data/acpi/q35/DSDT.cxl b/tests/data/acpi/q35/DSDT.cxl
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..c1206defed0154e9024702bba88453b2790a306d
 100644
GIT binary patch
literal 9615
zcmeHN&2JmW9i1g9X|-HQONzE^`9p-`^eaU|`6EeNq%pZmk+ejbCaE|94R9$bt!$^r
zB8h=GhEZ7o632(O3FDx*(k=t^*8T%U4YY^$W}qk?>Dm}
z9}AR+<^E>ho8P?InSIL{dUdby)5jSzBDphev7XMoSas9*7>qGGr*EeeJI|V1UartG
z;*prqydLN0IONRKx4qnI!T9;6|B>&%@vd*Q1GaX@xwX~~-oD|lF#=s)3oMIHocwgF
zo@+I?U90MrGG?n-^6czA%QRcAIE$LCtXE@ZYqjLD)XGHbOx=y$yu@7Z++w#f*4a$V
zT28b4x8t8L96a^Wxi_+RpZn_%ZeFrt035@}lb=dDZA9OAl*ND!qEp}%=
z={4m$z}MTB%p%(5*6N7`>)^X{jM>yV^!ZJ{
z-~XLBWzH4mlue;BZx*ZhhE!=l8>wn;6|0Rhvl+YhAkJdV>kh@UFXSs;x?1yE>D1G$
zdLzpMD)9prTJlX|oU8Hv7ka#(J!0&4{)otm$_qsV(;&
zuoz=#!)=+;px93asY>Rg>(l4MX)l%(j#PTiMS)O?+DuIM*Zl74rc>s%h6h-UN
zDw$@VwWnbC%x8vCFgDl*zK=wZt+{=)d}eirH8ZQROl#~2^-y#B*h;mrDC>@i`)z1g
z$C@e_Z${sYn!$Uh^^cOnHYh1~hte1m}MAew3L<9L{;X)^K-P6A$knuR34>Gt48*
zKo?aK5Bq4V>ed@Z{H|@8xHS~G=)2W44qm#sRnMQsEcl~s;l{-I&%Khgd)5jaR*TdAhq2PK|rd{OO7o+bCGrTcP>~F
z%z$fr9N8GQeb!4vjq7w^x97ThIv1^pAUPIcQ>-2MH`W~isOrQMNa^WGP3NSp6QQcp(sWvyPD|H`P}LdNbjCHEaa|`uRVSW>
z%Kfx8owlwMp{jFA(>bN-oYHk7RCP{kI;SqMyPOlUe2n$Co-6QQazsp(8=I+MCigsRS2P3Nqpb5_@hP}P~zbfz?*Rp%Z}
z=N?Vx9$hCwRp(w!=Uz?cUR@_bRVUsgF#6fER4+^6Z>r|U$h>fFzn
z+3-oYpEGme!*0J|x(`EQdLedRW6o>Ld7X(+WggI&2Q=mZorzFo9@LlzHReH`iBM%8
z(wK)d<{_PlP-PzGOzAR*Ia5C44-2Mza3dt9yn_o`*!f}Rth$Z5hrvxLsM+
zW@_LZi9-WLfV3irB9KX8paRD&$za5i?K6;6Kz9rjp_B##63>2YsMKVx?QYQ>l
zU^x>8szAv=1(fF`14Srx!axOlU^x>8szAv=1(Y+%
zKoLrvFi?TzOcu>1(q{mpbC@>R6seC3>2Z%2?G^a+#~P%=;f
z=
z3{*fllMEE0)CmI>Sk8ojDo`>|0p(0GP=rz^3{+q_69%e4$v_2^Gs!>^N}VuJf#pmX
zr~)Me6;RG314Srx!axOM_lrzad5lWpfP=V!47^ngz0~JutBm+e#b;3XqDF%v2HBf}Afg)54RAItE
z6($*|!XyJ#m@rU<2?JG_WS|O@3{+vlKouqoRAG{VDoip^g$V;ym@rU1;PyD(5hxMRFC$v_d}Hpq~evTtFah-BZwKoQBlB?Cn$`<4t8A(o2fTd+{pwLARB
zYL9-9-X5o~Z1ehepNi72R9e-b^$w$2JDY{$p3Tw0rGsZOti7Dg)A0)0$y)hDOw|
z^s+L6cZ955^02X7LyJKsnq5!qwPxR|L948^iOP;Yp0ui_{EX2kKE1(3)&2(eg@l
zc8$)hEnYH1>ro5{x5neSR=rj?Zf=Hcp!8H8X3q^|$KuIX}U=XmxU+NWmErABAZ
zHMh&8T`Z+xxi8diMIIr{%dUqhbyGwdEOz#bdx3Hk~mDPJndXJ~~2GV=Gr1!$~
z`N8zDM@XMn(^l{6PBrF6r}O`lZ42V~>!2NlCxN(=QFAU)m-8QkcFln11{b(ifEU
z1)jbzkiM`>`a+n#IGBFo5z-fx^hKV=$OZpk`^!*uqHJV!L*sg{UL9U)hN=@~BdZ%4r*nFB
zc+(lGPL%1@IR_rDXK%Q9QMArKT=B2g@^v{%Elf^$)0A6X
zO&4E%BY5TV57V!`{Q9exUt53qb=EZ8>dJCBTBj_lV0>bKVjDJA_2EghbpDBL+0EN1{T0GbXYQ2)OE`q7TJ@8jJoywN*Zu4+el-rxl}2c0i!~U
z`s3%h9yCZaaw_XqOPS1KhMFNZj>b|6x3Tn6q-%9H**k6~lev&8j$`#cJK22f{8KTx
zwLwBj04=`{79&}}{O){b@B)tQjo34_#SV?)jHOY~1mZoed*k8-d{mtbJ$2{#nO2Zmpxp57q}$a>0XzxGCMaTZ5
zE$bh5Cp_#qpo+44)q+}_h9`7wXw}Ex6!KG`?!T89)?OV5^!BEHGB6yeA
zX=5=T6FZAk;Tl_~TMljOhU!9lF0YO5=JKR_rrxl3>E_X+WvG61UT|SV-vm}<
zu#}(|2Mf{7BU{`(nE&`-dSG0eJsRDZ0p)BX5w}c+)dqSGO-*Cdv=JvUZ1cj!
z)B#MMLN(vYXO6LO#?wTiG3A_z(Ir0d!#S0Cnx(!2>>{I%*x3;jJ61|T)vfTBY6xff
z=y~yQ$sKi9JXDP5i1n%2%H=Bbg{?$6b)ObfH)%)s1~47)LbkK_3v_+8T4ko$M
zclo)CO<~ffYzkvnbmXqED2@N+Uz->HaI^f|=RTZwp8xX~yY3Y)e8s(D><%BJrtwW?
zYMt0U?s~CZ6WZXMKw{c#K1sY2q=!yUq5_w;olhEc=v}%!t!+Z5PEEo*8!Sjhw)lLm
zOx$W)TftjxE5g=-tFN`!@C%5ocb(2UK$Bu;%3~W;VC)oRQIP1YTalfTTv!s_DRJ@4
zxOQdDav1I4-Pm9(xY|bDH#Q6wY~1i^`u2SBlH!e9zP1DYWxxOYmwarpQ%t>*e$el?
z9*ny3OI+!NS0e1j6pLULiG9lcj$;%qr;nu!-co2R*k-D1{r|CqKQ#Q0jHOc;
zOFtetmVAtuUyK|}w{hjp&{(n=)(FK|Ix3z@hujxe@aGI==sd+!p5d&_i`NXk
zqs{;5QF7ujLk>;xv)pC{Dp%eQ#h*Ch;z7aR57Xo6
zafAQs<8-WAtk%R0puf~StOR{N;$3sNuDkYK+t`Pv!#B?(ef@YVIUY0MdN@DPN}4e%
zf(Ii-C+P}_aK88Ot~R%yTsr59-vCo*^W{}o>M=s{du6Wq_7;y4Y8
z=61ZE$%y~Ypi910+<$`)q(zV64;-lQm^?X7Cr!MBFNQ>5Bck9TIm%K

[PATCH v9 32/45] mem/cxl_type3: Add read and write functions for associated hostmem.

2022-04-04 Thread Jonathan Cameron via

From: Jonathan Cameron 

Once a read or write reaches a CXL type 3 device, the HDM decoders
on the device are used to establish the Device Physical Address
which should be accessed.  These functions peform the required maths
and then use a device specific address space to access the
hostmem->mr to fullfil the actual operation.  Note that failed writes
are silent, but failed reads return poison.  Note this is based
loosely on:

https://lore.kernel.org/qemu-devel/20200817161853.593247-6-f4...@amsat.org/
[RFC PATCH 0/9] hw/misc: Add support for interleaved memory accesses

Only lightly tested so far.  More complex test cases yet to be written.

Signed-off-by: Jonathan Cameron 
---
 hw/mem/cxl_type3.c  | 91 +
 include/hw/cxl/cxl_device.h |  6 +++
 2 files changed, 97 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 9578e72576..53fd57579b 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -95,7 +95,9 @@ static void ct3d_reg_write(void *opaque, hwaddr offset, 
uint64_t value,
 
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
+DeviceState *ds = DEVICE(ct3d);
 MemoryRegion *mr;
+char *name;
 
 if (!ct3d->hostmem) {
 error_setg(errp, "memdev property must be set");
@@ -110,6 +112,15 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error 
**errp)
 memory_region_set_nonvolatile(mr, true);
 memory_region_set_enabled(mr, true);
 host_memory_backend_set_mapped(ct3d->hostmem, true);
+
+if (ds->id) {
+name = g_strdup_printf("cxl-type3-dpa-space:%s", ds->id);
+} else {
+name = g_strdup("cxl-type3-dpa-space");
+}
+address_space_init(>hostmem_as, mr, name);
+g_free(name);
+
 ct3d->cxl_dstate.pmem_size = ct3d->hostmem->size;
 
 if (!ct3d->lsa) {
@@ -165,6 +176,86 @@ static void ct3_exit(PCIDevice *pci_dev)
 ComponentRegisters *regs = _cstate->crb;
 
 g_free(regs->special_ops);
+address_space_destroy(>hostmem_as);
+}
+
+/* TODO: Support multiple HDM decoders and DPA skip */
+static bool cxl_type3_dpa(CXLType3Dev *ct3d, hwaddr host_addr, uint64_t *dpa)
+{
+uint32_t *cache_mem = ct3d->cxl_cstate.crb.cache_mem_registers;
+uint64_t decoder_base, decoder_size, hpa_offset;
+uint32_t hdm0_ctrl;
+int ig, iw;
+
+decoder_base = (((uint64_t)cache_mem[R_CXL_HDM_DECODER0_BASE_HI] << 32) |
+cache_mem[R_CXL_HDM_DECODER0_BASE_LO]);
+if ((uint64_t)host_addr < decoder_base) {
+return false;
+}
+
+hpa_offset = (uint64_t)host_addr - decoder_base;
+
+decoder_size = ((uint64_t)cache_mem[R_CXL_HDM_DECODER0_SIZE_HI] << 32) |
+cache_mem[R_CXL_HDM_DECODER0_SIZE_LO];
+if (hpa_offset >= decoder_size) {
+return false;
+}
+
+hdm0_ctrl = cache_mem[R_CXL_HDM_DECODER0_CTRL];
+iw = FIELD_EX32(hdm0_ctrl, CXL_HDM_DECODER0_CTRL, IW);
+ig = FIELD_EX32(hdm0_ctrl, CXL_HDM_DECODER0_CTRL, IG);
+
+*dpa = (MAKE_64BIT_MASK(0, 8 + ig) & hpa_offset) |
+((MAKE_64BIT_MASK(8 + ig + iw, 64 - 8 - ig - iw) & hpa_offset) >> iw);
+
+return true;
+}
+
+MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
+   unsigned size, MemTxAttrs attrs)
+{
+CXLType3Dev *ct3d = CXL_TYPE3(d);
+uint64_t dpa_offset;
+MemoryRegion *mr;
+
+/* TODO support volatile region */
+mr = host_memory_backend_get_memory(ct3d->hostmem);
+if (!mr) {
+return MEMTX_ERROR;
+}
+
+if (!cxl_type3_dpa(ct3d, host_addr, _offset)) {
+return MEMTX_ERROR;
+}
+
+if (dpa_offset > int128_get64(mr->size)) {
+return MEMTX_ERROR;
+}
+
+return address_space_read(>hostmem_as, dpa_offset, attrs, data, 
size);
+}
+
+MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
+unsigned size, MemTxAttrs attrs)
+{
+CXLType3Dev *ct3d = CXL_TYPE3(d);
+uint64_t dpa_offset;
+MemoryRegion *mr;
+
+mr = host_memory_backend_get_memory(ct3d->hostmem);
+if (!mr) {
+return MEMTX_OK;
+}
+
+if (!cxl_type3_dpa(ct3d, host_addr, _offset)) {
+return MEMTX_OK;
+}
+
+if (dpa_offset > int128_get64(mr->size)) {
+return MEMTX_OK;
+}
+return address_space_write(>hostmem_as, dpa_offset, attrs,
+   , size);
 }
 
 static void ct3d_reset(DeviceState *dev)
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 4285fbda08..1e141b6621 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -239,6 +239,7 @@ struct CXLType3Dev {
 HostMemoryBackend *lsa;
 
 /* State */
+AddressSpace hostmem_as;
 CXLComponentState cxl_cstate;
 CXLDeviceState cxl_dstate;
 };
@@ -259,4 +260,9 @@ struct CXLType3Class {
 uint64_t offset);
 };
 
+MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
+   unsigned

[PATCH v9 20/45] hw/cxl/device: Plumb real Label Storage Area (LSA) sizing

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

This should introduce no change. Subsequent work will make use of this
new class member.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-mailbox-utils.c  |  3 +++
 hw/mem/cxl_type3.c  |  9 +
 include/hw/cxl/cxl_device.h | 11 ++-
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index c8188d7087..492739aef3 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -277,6 +277,8 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd 
*cmd,
 } QEMU_PACKED *id;
 QEMU_BUILD_BUG_ON(sizeof(*id) != 0x43);
 
+CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
 uint64_t size = cxl_dstate->pmem_size;
 
 if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
@@ -291,6 +293,7 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd 
*cmd,
 
 id->total_capacity = size / (256 << 20);
 id->persistent_capacity = size / (256 << 20);
+id->lsa_size = cvc->get_lsa_size(ct3d);
 
 *len = sizeof(*id);
 return CXL_MBOX_SUCCESS;
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 5c93fbbd9b..14d8b0c503 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -176,10 +176,16 @@ static Property ct3_props[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static uint64_t get_lsa_size(CXLType3Dev *ct3d)
+{
+return 0;
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
 PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
+CXLType3Class *cvc = CXL_TYPE3_CLASS(oc);
 
 pc->realize = ct3_realize;
 pc->exit = ct3_exit;
@@ -192,11 +198,14 @@ static void ct3_class_init(ObjectClass *oc, void *data)
 dc->desc = "CXL PMEM Device (Type 3)";
 dc->reset = ct3d_reset;
 device_class_set_props(dc, ct3_props);
+
+cvc->get_lsa_size = get_lsa_size;
 }
 
 static const TypeInfo ct3d_info = {
 .name = TYPE_CXL_TYPE3,
 .parent = TYPE_PCI_DEVICE,
+.class_size = sizeof(struct CXLType3Class),
 .class_init = ct3_class_init,
 .instance_size = sizeof(CXLType3Dev),
 .interfaces = (InterfaceInfo[]) {
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index d8da2c7b68..ea2571a69b 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -236,6 +236,7 @@ struct CXLType3Dev {
 
 /* Properties */
 HostMemoryBackend *hostmem;
+HostMemoryBackend *lsa;
 
 /* State */
 CXLComponentState cxl_cstate;
@@ -243,6 +244,14 @@ struct CXLType3Dev {
 };
 
 #define TYPE_CXL_TYPE3 "cxl-type3"
-OBJECT_DECLARE_SIMPLE_TYPE(CXLType3Dev, CXL_TYPE3)
+OBJECT_DECLARE_TYPE(CXLType3Dev, CXLType3Class, CXL_TYPE3)
+
+struct CXLType3Class {
+/* Private */
+PCIDeviceClass parent_class;
+
+/* public */
+uint64_t (*get_lsa_size)(CXLType3Dev *ct3d);
+};
 
 #endif
-- 
2.32.0

Re: [PATCH v1 8/9] qom: add command to print initial properties

2022-04-04 Thread Maxim Davydov




On 3/30/22 18:17, Vladimir Sementsov-Ogievskiy wrote:

29.03.2022 00:15, Maxim Davydov wrote:
The command "query-init-properties" is needed to get values of 
properties
after initialization (not only default value). It makes sense, for 
example,

when working with x86_64-cpu.
All machine types (and x-remote-object, because its init uses machime
type's infrastructure) should be skipped, because only the one 
instance can

be correctly initialized.

Signed-off-by: Maxim Davydov 
---
  qapi/qom.json  |  69 ++
  qom/qom-qmp-cmds.c | 121 +
  2 files changed, 190 insertions(+)

diff --git a/qapi/qom.json b/qapi/qom.json
index eeb5395ff3..1eedc441eb 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -949,3 +949,72 @@
  ##
  { 'command': 'object-del', 'data': {'id': 'str'},
    'allow-preconfig': true }
+
+##
+# @InitValue:
+#
+# Not all objects have default values but they have "initial" values.
+#
+# @name: property name
+#
+# @value: Current value (default or after initialization. It makes 
sence,

+# for example, for x86-cpus)
+#
+# Since: 7.0


7.1 (here and below)


+#
+##
+{ 'struct': 'InitValue',
+  'data': { 'name': 'str',
+    '*value': 'any' } }
+


[..]


diff --git a/qom/qom-qmp-cmds.c b/qom/qom-qmp-cmds.c
index 2d6f41ecc7..c1bb3f1f8b 100644
--- a/qom/qom-qmp-cmds.c
+++ b/qom/qom-qmp-cmds.c
@@ -27,6 +27,7 @@
  #include "qemu/cutils.h"
  #include "qom/object_interfaces.h"
  #include "qom/qom-qobject.h"
+#include "hw/boards.h"
    ObjectPropertyInfoList *qmp_qom_list(const char *path, Error **errp)
  {
@@ -235,3 +236,123 @@ void qmp_object_del(const char *id, Error **errp)
  {
  user_creatable_del(id, errp);
  }
+
+static void query_object_prop(InitValueList **props_list, 
ObjectProperty *prop,

+  Object *obj, Error **errp)
+{
+    InitValue *prop_info = NULL;
+
+    /* Skip inconsiderable properties */
+    if (strcmp(prop->name, "type") == 0 ||
+    strcmp(prop->name, "realized") == 0 ||
+    strcmp(prop->name, "hotpluggable") == 0 ||
+    strcmp(prop->name, "hotplugged") == 0 ||
+    strcmp(prop->name, "parent_bus") == 0) {
+    return;
+    }
+
+    prop_info = g_malloc0(sizeof(*prop_info));
+    prop_info->name = g_strdup(prop->name);
+    prop_info->value = NULL;
+    if (prop->defval) {
+    prop_info->value = qobject_ref(prop->defval);
+    } else if (prop->get) {
+    /*
+ * crash-information in x86-cpu uses errp to return current 
state.

+ * So, after requesting this property it returns GenericError:
+ * "No crash occured"
+ */
+    if (strcmp(prop->name, "crash-information") != 0) {
+    prop_info->value = object_property_get_qobject(obj, 
prop->name,

+ errp);
+    }
+    }


Hmmm. Should we instead call prop->get() when is is available, and 
only if not use prep->defval?
default properties more rare and sometimes can give more information (if 
the device developer thought that there should be a default value). And 
I think that if prop->get() isn't available, prop->defval() isn't too.



+    prop_info->has_value = !!prop_info->value;
+
+    QAPI_LIST_PREPEND(*props_list, prop_info);
+}
+
+typedef struct QIPData {
+    InitPropsList **dev_list;
+    Error **errp;
+} QIPData;
+
+static void query_init_properties_tramp(gpointer list_data, gpointer 
opaque)

+{
+    ObjectClass *k = list_data;
+    Object *obj;
+    ObjectClass *parent;
+    GHashTableIter iter;
+
+    QIPData *data = opaque;
+    ClassPropertiesList *class_props_list = NULL;
+    InitProps *dev_info;
+
+    /* Only one machine can be initialized correctly (it's already 
happened) */

+    if (object_class_dynamic_cast(k, TYPE_MACHINE)) {
+    return;
+    }
+
+    const char *klass_name = object_class_get_name(k);
+    /*
+ * Uses machine type infrastructure with notifiers. It causes 
immediate

+ * notify and SEGSEGV during remote_object_machine_done
+ */
+    if (strcmp(klass_name, "x-remote-object") == 0) {
+    return;
+    }
+
+    dev_info = g_malloc0(sizeof(*dev_info));
+    dev_info->name = g_strdup(klass_name);
+
+    obj = object_new_with_class(k);
+
+    /*
+ * Part of ObjectPropertyIterator infrastructure, but we need 
more precise

+ * control of current class to dump appropriate features
+ * This part was taken out from loop because first 
initialization differ

+ * from other reinitializations
+ */
+    parent = object_get_class(obj);


hmm.. obj = object_new_with_class(k); parent = 
object_get_class(obj);.. Looks for me like parent should be equal to 
k. Or object_ API is rather unobvious.

I'll change it)



+    g_hash_table_iter_init(, obj->properties);
+    const char *prop_owner_name = object_get_typename(obj);
+    do {
+    InitValueList *prop_list = NULL;
+    ClassProperties *class_data;
+
+    gpointer key, val;
+    while (g_hash_table_iter_next(,

[PATCH v9 31/45] CXL/cxl_component: Add cxl_get_hb_cstate()

2022-04-04 Thread Jonathan Cameron via

From: Jonathan Cameron 

Accessor to get hold of the cxl state for a CXL host bridge
without exposing the internals of the implementation.

Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/pci-bridge/pci_expander_bridge.c | 7 +++
 include/hw/cxl/cxl_component.h  | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index b4813b6851..963fa41a11 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -72,6 +72,13 @@ static GList *pxb_dev_list;
 
 #define TYPE_PXB_HOST "pxb-host"
 
+CXLComponentState *cxl_get_hb_cstate(PCIHostState *hb)
+{
+CXLHost *host = PXB_CXL_HOST(hb);
+
+return >cxl_cstate;
+}
+
 static int pxb_bus_num(PCIBus *bus)
 {
 PXBDev *pxb = convert_to_pxb(bus->parent_dev);
diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h
index b0f95d3484..779a7b1a97 100644
--- a/include/hw/cxl/cxl_component.h
+++ b/include/hw/cxl/cxl_component.h
@@ -202,4 +202,6 @@ static inline hwaddr cxl_decode_ig(int ig)
 return 1 << (ig + 8);
 }
 
+CXLComponentState *cxl_get_hb_cstate(PCIHostState *hb);
+
 #endif
-- 
2.32.0

[PATCH v9 17/45] hw/cxl/device: Add a memory device (8.2.8.5)

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

A CXL memory device (AKA Type 3) is a CXL component that contains some
combination of volatile and persistent memory. It also implements the
previously defined mailbox interface as well as the memory device
firmware interface.

Although the memory device is configured like a normal PCIe device, the
memory traffic is on an entirely separate bus conceptually (using the
same physical wires as PCIe, but different protocol).

Once the CXL topology is fully configure and address decoders committed,
the guest physical address for the memory device is part of a larger
window which is owned by the platform.  The creation of these windows
is later in this series.

The following example will create a 256M device in a 512M window:
-object "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M"
-device "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0"

Note: Dropped PCDIMM info interfaces for now.  They can be added if
appropriate at a later date.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-mailbox-utils.c  |  46 +++
 hw/mem/Kconfig  |   5 ++
 hw/mem/cxl_type3.c  | 159 
 hw/mem/meson.build  |   1 +
 include/hw/cxl/cxl_device.h |  15 
 include/hw/cxl/cxl_pci.h|  21 +
 include/hw/pci/pci_ids.h|   1 +
 7 files changed, 248 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index db473135c7..4ae0561dfc 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -50,6 +50,8 @@ enum {
 LOGS= 0x04,
 #define GET_SUPPORTED 0x0
 #define GET_LOG   0x1
+IDENTIFY= 0x40,
+#define MEMORY_DEVICE 0x0
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -214,6 +216,48 @@ static ret_code cmd_logs_get_log(struct cxl_cmd *cmd,
 return CXL_MBOX_SUCCESS;
 }
 
+/* 8.2.9.5.1.1 */
+static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
+   CXLDeviceState *cxl_dstate,
+   uint16_t *len)
+{
+struct {
+char fw_revision[0x10];
+uint64_t total_capacity;
+uint64_t volatile_capacity;
+uint64_t persistent_capacity;
+uint64_t partition_align;
+uint16_t info_event_log_size;
+uint16_t warning_event_log_size;
+uint16_t failure_event_log_size;
+uint16_t fatal_event_log_size;
+uint32_t lsa_size;
+uint8_t poison_list_max_mer[3];
+uint16_t inject_poison_limit;
+uint8_t poison_caps;
+uint8_t qos_telemetry_caps;
+} QEMU_PACKED *id;
+QEMU_BUILD_BUG_ON(sizeof(*id) != 0x43);
+
+uint64_t size = cxl_dstate->pmem_size;
+
+if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
+return CXL_MBOX_INTERNAL_ERROR;
+}
+
+id = (void *)cmd->payload;
+memset(id, 0, sizeof(*id));
+
+/* PMEM only */
+snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
+
+id->total_capacity = size / (256 << 20);
+id->persistent_capacity = size / (256 << 20);
+
+*len = sizeof(*id);
+return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
@@ -231,6 +275,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8, 
IMMEDIATE_POLICY_CHANGE },
 [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported, 0, 
0 },
 [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
+[IDENTIFY][MEMORY_DEVICE] = { "IDENTIFY_MEMORY_DEVICE",
+cmd_identify_memory_device, 0, 0 },
 };
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig
index 03dbb3c7df..73c5ae8ad9 100644
--- a/hw/mem/Kconfig
+++ b/hw/mem/Kconfig
@@ -11,3 +11,8 @@ config NVDIMM
 
 config SPARSE_MEM
 bool
+
+config CXL_MEM_DEVICE
+bool
+default y if CXL
+select MEM_DEVICE
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
new file mode 100644
index 00..329a6ea2a9
--- /dev/null
+++ b/hw/mem/cxl_type3.c
@@ -0,0 +1,159 @@
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qemu/error-report.h"
+#include "hw/mem/memory-device.h"
+#include "hw/mem/pc-dimm.h"
+#include "hw/pci/pci.h"
+#include "hw/qdev-properties.h"
+#include "qapi/error.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qemu/range.h"
+#include "qemu/rcu.h"
+#include "sysemu/hostmem.h"
+#include "hw/cxl/cxl.h"
+
+static void build_dvsecs(CXLType3Dev *ct3d)
+{
+CXLComponentState *cxl_cstate = >cxl_cstate;
+uint8_t *dvsec;
+
+dvsec = (uint8_t *)&(CXLDVSECDevice){
+.cap = 0x1e,
+.ctrl = 0x6,
+.status2 = 0x2,
+.range1_size_hi = ct3d->hostmem->size >> 32,
+.range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
+(ct3d->hostmem->size & 0xF000),
+

[PATCH v9 35/45] i386/pc: Enable CXL fixed memory windows

2022-04-04 Thread Jonathan Cameron via

From: Jonathan Cameron 

Add the CFMWs memory regions to the memorymap and adjust the
PCI window to avoid hitting the same memory.

Signed-off-by: Jonathan Cameron 
---
 hw/i386/pc.c | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index da74f08f9e..48a86ac8a4 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -814,7 +814,7 @@ void pc_memory_init(PCMachineState *pcms,
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 X86MachineState *x86ms = X86_MACHINE(pcms);
-hwaddr cxl_base;
+hwaddr cxl_base, cxl_resv_end = 0;
 
 assert(machine->ram_size == x86ms->below_4g_mem_size +
 x86ms->above_4g_mem_size);
@@ -922,6 +922,24 @@ void pc_memory_init(PCMachineState *pcms,
 e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
 memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size);
 memory_region_add_subregion(system_memory, cxl_base, mr);
+cxl_resv_end = cxl_base + cxl_size;
+if (machine->cxl_devices_state->fixed_windows) {
+hwaddr cxl_fmw_base;
+GList *it;
+
+cxl_fmw_base = ROUND_UP(cxl_base + cxl_size, 256 * MiB);
+for (it = machine->cxl_devices_state->fixed_windows; it; it = 
it->next) {
+CXLFixedWindow *fw = it->data;
+
+fw->base = cxl_fmw_base;
+memory_region_init_io(>mr, OBJECT(machine), _ops, fw,
+  "cxl-fixed-memory-region", fw->size);
+memory_region_add_subregion(system_memory, fw->base, >mr);
+e820_add_entry(fw->base, fw->size, E820_RESERVED);
+cxl_fmw_base += fw->size;
+cxl_resv_end = cxl_fmw_base;
+}
+}
 }
 
 /* Initialize PC system firmware */
@@ -951,6 +969,10 @@ void pc_memory_init(PCMachineState *pcms,
 if (!pcmc->broken_reserved_end) {
 res_mem_end += memory_region_size(>device_memory->mr);
 }
+
+if (machine->cxl_devices_state->is_enabled) {
+res_mem_end = cxl_resv_end;
+}
 *val = cpu_to_le64(ROUND_UP(res_mem_end, 1 * GiB));
 fw_cfg_add_file(fw_cfg, "etc/reserved-memory-end", val, sizeof(*val));
 }
@@ -987,6 +1009,13 @@ uint64_t pc_pci_hole64_start(void)
 if (ms->cxl_devices_state->host_mr.addr) {
 hole64_start = ms->cxl_devices_state->host_mr.addr +
 memory_region_size(>cxl_devices_state->host_mr);
+if (ms->cxl_devices_state->fixed_windows) {
+GList *it;
+for (it = ms->cxl_devices_state->fixed_windows; it; it = it->next) 
{
+CXLFixedWindow *fw = it->data;
+hole64_start = fw->mr.addr + memory_region_size(>mr);
+}
+}
 } else if (pcmc->has_reserved_memory && ms->device_memory->base) {
 hole64_start = ms->device_memory->base;
 if (!pcmc->broken_reserved_end) {
-- 
2.32.0

[PATCH v9 43/45] pci-bridge/cxl_upstream: Add a CXL switch upstream port

2022-04-04 Thread Jonathan Cameron via

An initial simple upstream port emulation to allow the creation
of CXL switches. The Device ID has been allocated for this use.

Signed-off-by: Jonathan Cameron 
---
 hw/pci-bridge/cxl_upstream.c | 211 +++
 hw/pci-bridge/meson.build|   2 +-
 include/hw/cxl/cxl.h |   4 +
 3 files changed, 216 insertions(+), 1 deletion(-)

diff --git a/hw/pci-bridge/cxl_upstream.c b/hw/pci-bridge/cxl_upstream.c
new file mode 100644
index 00..5a06aeef67
--- /dev/null
+++ b/hw/pci-bridge/cxl_upstream.c
@@ -0,0 +1,211 @@
+/*
+ * Emulated CXL Switch Upstream Port
+ *
+ * Copyright (c) 2022 Huawei Technologies.
+ *
+ * Based on xio31130_upstream.c
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/pcie.h"
+#include "hw/pci/pcie_port.h"
+
+#define CXL_UPSTREAM_PORT_MSI_NR_VECTOR 1
+
+#define CXL_UPSTREAM_PORT_MSI_OFFSET 0x70
+#define CXL_UPSTREAM_PORT_PCIE_CAP_OFFSET 0x90
+#define CXL_UPSTREAM_PORT_AER_OFFSET 0x100
+#define CXL_UPSTREAM_PORT_DVSEC_OFFSET \
+(CXL_UPSTREAM_PORT_AER_OFFSET + PCI_ERR_SIZEOF)
+
+typedef struct CXLUpstreamPort {
+/*< private >*/
+PCIEPort parent_obj;
+
+/*< public >*/
+CXLComponentState cxl_cstate;
+} CXLUpstreamPort;
+
+CXLComponentState *cxl_usp_to_cstate(CXLUpstreamPort *usp)
+{
+return >cxl_cstate;
+}
+
+static void cxl_usp_dvsec_write_config(PCIDevice *dev, uint32_t addr,
+   uint32_t val, int len)
+{
+CXLUpstreamPort *usp = CXL_USP(dev);
+
+if (range_contains(>cxl_cstate.dvsecs[EXTENSIONS_PORT_DVSEC], addr)) {
+uint8_t *reg = >config[addr];
+addr -= usp->cxl_cstate.dvsecs[EXTENSIONS_PORT_DVSEC].lob;
+if (addr == PORT_CONTROL_OFFSET) {
+if (pci_get_word(reg) & PORT_CONTROL_UNMASK_SBR) {
+/* unmask SBR */
+qemu_log_mask(LOG_UNIMP, "SBR mask control is not 
supported\n");
+}
+if (pci_get_word(reg) & PORT_CONTROL_ALT_MEMID_EN) {
+/* Alt Memory & ID Space Enable */
+qemu_log_mask(LOG_UNIMP,
+  "Alt Memory & ID space is not supported\n");
+}
+}
+}
+}
+
+static void cxl_usp_write_config(PCIDevice *d, uint32_t address,
+ uint32_t val, int len)
+{
+pci_bridge_write_config(d, address, val, len);
+pcie_cap_flr_write_config(d, address, val, len);
+pcie_aer_write_config(d, address, val, len);
+
+cxl_usp_dvsec_write_config(d, address, val, len);
+}
+
+static void latch_registers(CXLUpstreamPort *usp)
+{
+uint32_t *reg_state = usp->cxl_cstate.crb.cache_mem_registers;
+
+cxl_component_register_init_common(reg_state, CXL2_UPSTREAM_PORT);
+ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, TARGET_COUNT, 8);
+}
+
+static void cxl_usp_reset(DeviceState *qdev)
+{
+PCIDevice *d = PCI_DEVICE(qdev);
+CXLUpstreamPort *usp = CXL_USP(qdev);
+
+pci_bridge_reset(qdev);
+pcie_cap_deverr_reset(d);
+latch_registers(usp);
+}
+
+static void build_dvsecs(CXLComponentState *cxl)
+{
+uint8_t *dvsec;
+
+dvsec = (uint8_t *)&(CXLDVSECPortExtensions){
+.status = 0x1, /* Port Power Management Init Complete */
+};
+cxl_component_create_dvsec(cxl, EXTENSIONS_PORT_DVSEC_LENGTH,
+   EXTENSIONS_PORT_DVSEC,
+   EXTENSIONS_PORT_DVSEC_REVID, dvsec);
+dvsec = (uint8_t *)&(CXLDVSECPortFlexBus){
+.cap = 0x27, /* Cache, IO, Mem, non-MLD */
+.ctrl= 0x27, /* Cache, IO, Mem */
+.status  = 0x26, /* same */
+.rcvd_mod_ts_data_phase1 = 0xef, /* WTF? */
+};
+cxl_component_create_dvsec(cxl, PCIE_FLEXBUS_PORT_DVSEC_LENGTH_2_0,
+   PCIE_FLEXBUS_PORT_DVSEC,
+   PCIE_FLEXBUS_PORT_DVSEC_REVID_2_0, dvsec);
+
+dvsec = (uint8_t *)&(CXLDVSECRegisterLocator){
+.rsvd = 0,
+.reg0_base_lo = RBI_COMPONENT_REG | CXL_COMPONENT_REG_BAR_IDX,
+.reg0_base_hi = 0,
+};
+cxl_component_create_dvsec(cxl, REG_LOC_DVSEC_LENGTH, REG_LOC_DVSEC,
+   REG_LOC_DVSEC_REVID, dvsec);
+}
+
+static void cxl_usp_realize(PCIDevice *d, Error **errp)
+{
+PCIEPort *p = PCIE_PORT(d);
+CXLUpstreamPort *usp = CXL_USP(d);
+CXLComponentState *cxl_cstate = >cxl_cstate;
+ComponentRegisters *cregs = _cstate->crb;
+MemoryRegion *component_bar = >component_registers;
+int rc;
+
+pci_bridge_initfn(d, TYPE_PCIE_BUS);
+pcie_port_init_reg(d);
+
+rc = msi_init(d, CXL_UPSTREAM_PORT_MSI_OFFSET,
+  CXL_UPSTREAM_PORT_MSI_NR_VECTOR, true, true, errp);
+if (rc) {
+assert(rc == -ENOTSUP);
+goto err_bridge;
+}
+
+rc = pcie_cap_init(d, CXL_UPSTREAM_PORT_PCIE_CAP_OFFSET,
+

[PATCH v9 30/45] pci/pcie_port: Add pci_find_port_by_pn()

2022-04-04 Thread Jonathan Cameron via

From: Jonathan Cameron 

Simple function to search a PCIBus to find a port by
it's port number.

CXL interleave decoding uses the port number as a target
so it is necessary to locate the port when doing interleave
decoding.

Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/pci/pcie_port.c | 25 +
 include/hw/pci/pcie_port.h |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/hw/pci/pcie_port.c b/hw/pci/pcie_port.c
index e95c1e5519..687e4e763a 100644
--- a/hw/pci/pcie_port.c
+++ b/hw/pci/pcie_port.c
@@ -136,6 +136,31 @@ static void pcie_port_class_init(ObjectClass *oc, void 
*data)
 device_class_set_props(dc, pcie_port_props);
 }
 
+PCIDevice *pcie_find_port_by_pn(PCIBus *bus, uint8_t pn)
+{
+int devfn;
+
+for (devfn = 0; devfn < ARRAY_SIZE(bus->devices); devfn++) {
+PCIDevice *d = bus->devices[devfn];
+PCIEPort *port;
+
+if (!d || !pci_is_express(d) || !d->exp.exp_cap) {
+continue;
+}
+
+if (!object_dynamic_cast(OBJECT(d), TYPE_PCIE_PORT)) {
+continue;
+}
+
+port = PCIE_PORT(d);
+if (port->port == pn) {
+return d;
+}
+}
+
+return NULL;
+}
+
 static const TypeInfo pcie_port_type_info = {
 .name = TYPE_PCIE_PORT,
 .parent = TYPE_PCI_BRIDGE,
diff --git a/include/hw/pci/pcie_port.h b/include/hw/pci/pcie_port.h
index e25b289ce8..7b8193061a 100644
--- a/include/hw/pci/pcie_port.h
+++ b/include/hw/pci/pcie_port.h
@@ -39,6 +39,8 @@ struct PCIEPort {
 
 void pcie_port_init_reg(PCIDevice *d);
 
+PCIDevice *pcie_find_port_by_pn(PCIBus *bus, uint8_t pn);
+
 #define TYPE_PCIE_SLOT "pcie-slot"
 OBJECT_DECLARE_SIMPLE_TYPE(PCIESlot, PCIE_SLOT)
 
-- 
2.32.0

[PATCH v9 15/45] qtest/cxl: Introduce initial test for pxb-cxl only.

2022-04-04 Thread Jonathan Cameron via

Initial test with just pxb-cxl.  Other tests will be added
alongside functionality.

Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
Tested-by: Alex Bennée 
---
 tests/qtest/cxl-test.c  | 23 +++
 tests/qtest/meson.build |  4 
 2 files changed, 27 insertions(+)

diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
new file mode 100644
index 00..1006c8ae4e
--- /dev/null
+++ b/tests/qtest/cxl-test.c
@@ -0,0 +1,23 @@
+/*
+ * QTest testcase for CXL
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest-single.h"
+
+
+static void cxl_basic_pxb(void)
+{
+qtest_start("-machine q35,cxl=on -device pxb-cxl,bus=pcie.0");
+qtest_end();
+}
+
+int main(int argc, char **argv)
+{
+g_test_init(, , NULL);
+qtest_add_func("/pci/cxl/basic_pxb", cxl_basic_pxb);
+return g_test_run();
+}
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index d25f82bb5a..6e1ad4dc9a 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -41,6 +41,9 @@ qtests_pci = \
   (config_all_devices.has_key('CONFIG_VGA') ? ['display-vga-test'] : []) + 
 \
   (config_all_devices.has_key('CONFIG_IVSHMEM_DEVICE') ? ['ivshmem-test'] : [])
 
+qtests_cxl = \
+  (config_all_devices.has_key('CONFIG_CXL') ? ['cxl-test'] : [])
+
 qtests_i386 = \
   (slirp.found() ? ['pxe-test', 'test-netfilter'] : []) + \
   (config_host.has_key('CONFIG_POSIX') ? ['test-filter-mirror'] : []) +
 \
@@ -75,6 +78,7 @@ qtests_i386 = \
slirp.found() ? ['virtio-net-failover'] : []) + 
 \
   (unpack_edk2_blobs ? ['bios-tables-test'] : []) +
 \
   qtests_pci + 
 \
+  qtests_cxl + 
 \
   ['fdc-test',
'ide-test',
'hd-geo-test',
-- 
2.32.0

[PATCH v9 34/45] hw/cxl/component Add a dumb HDM decoder handler

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

Add a trivial handler for now to cover the root bridge
where we could do some error checking in future.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-component-utils.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
index 1a1adbd4cb..148f9f30d9 100644
--- a/hw/cxl/cxl-component-utils.c
+++ b/hw/cxl/cxl-component-utils.c
@@ -32,6 +32,31 @@ static uint64_t cxl_cache_mem_read_reg(void *opaque, hwaddr 
offset,
 }
 }
 
+static void dumb_hdm_handler(CXLComponentState *cxl_cstate, hwaddr offset,
+ uint32_t value)
+{
+ComponentRegisters *cregs = _cstate->crb;
+uint32_t *cache_mem = cregs->cache_mem_registers;
+bool should_commit = false;
+
+switch (offset) {
+case A_CXL_HDM_DECODER0_CTRL:
+should_commit = FIELD_EX32(value, CXL_HDM_DECODER0_CTRL, COMMIT);
+break;
+default:
+break;
+}
+
+memory_region_transaction_begin();
+stl_le_p((uint8_t *)cache_mem + offset, value);
+if (should_commit) {
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMIT, 0);
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, ERR, 0);
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMITTED, 1);
+}
+memory_region_transaction_commit();
+}
+
 static void cxl_cache_mem_write_reg(void *opaque, hwaddr offset, uint64_t 
value,
 unsigned size)
 {
@@ -45,6 +70,12 @@ static void cxl_cache_mem_write_reg(void *opaque, hwaddr 
offset, uint64_t value,
 }
 if (cregs->special_ops && cregs->special_ops->write) {
 cregs->special_ops->write(cxl_cstate, offset, value, size);
+return;
+}
+
+if (offset >= A_CXL_HDM_DECODER_CAPABILITY &&
+offset <= A_CXL_HDM_DECODER0_TARGET_LIST_HI) {
+dumb_hdm_handler(cxl_cstate, offset, value);
 } else {
 cregs->cache_mem_registers[offset / 
sizeof(*cregs->cache_mem_registers)] = value;
 }
-- 
2.32.0

[PATCH v9 23/45] hw/cxl/component: Implement host bridge MMIO (8.2.5, table 142)

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

CXL host bridges themselves may have MMIO. Since host bridges don't have
a BAR they are treated as special for MMIO.  This patch includes
i386/pc support.
Also hook up the device reset now that we have have the MMIO
space in which the results are visible.

Note that we duplicate the PCI express case for the aml_build but
the implementations will diverge when the CXL specific _OSC is
introduced.

Signed-off-by: Ben Widawsky 
Co-developed-by: Jonathan Cameron 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/i386/acpi-build.c| 25 ++-
 hw/i386/pc.c| 27 +++-
 hw/pci-bridge/pci_expander_bridge.c | 65 +
 include/hw/cxl/cxl.h| 14 +++
 4 files changed, 121 insertions(+), 10 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index dcf6ece3d0..2d81b0f40c 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -28,6 +28,7 @@
 #include "qemu/bitmap.h"
 #include "qemu/error-report.h"
 #include "hw/pci/pci.h"
+#include "hw/cxl/cxl.h"
 #include "hw/core/cpu.h"
 #include "target/i386/cpu.h"
 #include "hw/misc/pvpanic.h"
@@ -1572,10 +1573,21 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 }
 
 scope = aml_scope("\\_SB");
-dev = aml_device("PC%.02X", bus_num);
+
+if (pci_bus_is_cxl(bus)) {
+dev = aml_device("CL%.02X", bus_num);
+} else {
+dev = aml_device("PC%.02X", bus_num);
+}
 aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
 aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
-if (pci_bus_is_express(bus)) {
+if (pci_bus_is_cxl(bus)) {
+aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
+aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
+
+/* Expander bridges do not have ACPI PCI Hot-plug enabled */
+aml_append(dev, build_q35_osc_method(true));
+} else if (pci_bus_is_express(bus)) {
 aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
 aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
 
@@ -1595,6 +1607,15 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 aml_append(dev, aml_name_decl("_CRS", crs));
 aml_append(scope, dev);
 aml_append(dsdt, scope);
+
+/* Handle the ranges for the PXB expanders */
+if (pci_bus_is_cxl(bus)) {
+MemoryRegion *mr = >cxl_devices_state->host_mr;
+uint64_t base = mr->addr;
+
+crs_range_insert(crs_range_set.mem_ranges, base,
+ base + memory_region_size(mr) - 1);
+}
 }
 }
 
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index e2849fc741..da74f08f9e 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -75,6 +75,7 @@
 #include "acpi-build.h"
 #include "hw/mem/pc-dimm.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/cxl/cxl.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-common.h"
 #include "qapi/qapi-visit-machine.h"
@@ -813,6 +814,7 @@ void pc_memory_init(PCMachineState *pcms,
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 X86MachineState *x86ms = X86_MACHINE(pcms);
+hwaddr cxl_base;
 
 assert(machine->ram_size == x86ms->below_4g_mem_size +
 x86ms->above_4g_mem_size);
@@ -902,6 +904,26 @@ void pc_memory_init(PCMachineState *pcms,
 >device_memory->mr);
 }
 
+if (machine->cxl_devices_state->is_enabled) {
+MemoryRegion *mr = >cxl_devices_state->host_mr;
+hwaddr cxl_size = MiB;
+
+if (pcmc->has_reserved_memory && machine->device_memory->base) {
+cxl_base = machine->device_memory->base;
+if (!pcmc->broken_reserved_end) {
+cxl_base += memory_region_size(>device_memory->mr);
+}
+} else if (pcms->sgx_epc.size != 0) {
+cxl_base = sgx_epc_above_4g_end(>sgx_epc);
+} else {
+cxl_base = 0x1ULL + x86ms->above_4g_mem_size;
+}
+
+e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
+memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size);
+memory_region_add_subregion(system_memory, cxl_base, mr);
+}
+
 /* Initialize PC system firmware */
 pc_system_firmware_init(pcms, rom_memory);
 
@@ -962,7 +984,10 @@ uint64_t pc_pci_hole64_start(void)
 X86MachineState *x86ms = X86_MACHINE(pcms);
 uint64_t hole64_start = 0;
 
-if (pcmc->has_reserved_memory && ms->device_memory->base) {
+if (ms->cxl_devices_state->host_mr.addr) {
+hole64_start = ms->cxl_devices_state->host_mr.addr +
+

[PATCH v9 28/45] acpi/cxl: Introduce CFMWS structures in CEDT

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

The CEDT CXL Fixed Window Memory Window Structures (CFMWs)
define regions of the host phyiscal address map which
(via an impdef means) are configured such that they have
a particular interleave setup across one or more CXL Host Bridges.

Reported-by: Alison Schofield 
Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/acpi/cxl.c | 59 +++
 1 file changed, 59 insertions(+)

diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
index aa4af86a4c..31d5235136 100644
--- a/hw/acpi/cxl.c
+++ b/hw/acpi/cxl.c
@@ -60,6 +60,64 @@ static void cedt_build_chbs(GArray *table_data, PXBDev *cxl)
 build_append_int_noprefix(table_data, memory_region_size(mr), 8);
 }
 
+/*
+ * CFMWS entries in CXL 2.0 ECN: CEDT CFMWS & QTG _DSM.
+ * Interleave ways encoding in CXL 2.0 ECN: 3, 6, 12 and 16-way memory
+ * interleaving.
+ */
+static void cedt_build_cfmws(GArray *table_data, MachineState *ms)
+{
+CXLState *cxls = ms->cxl_devices_state;
+GList *it;
+
+for (it = cxls->fixed_windows; it; it = it->next) {
+CXLFixedWindow *fw = it->data;
+int i;
+
+/* Type */
+build_append_int_noprefix(table_data, 1, 1);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Record Length */
+build_append_int_noprefix(table_data, 36 + 4 * fw->num_targets, 2);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+
+/* Base HPA */
+build_append_int_noprefix(table_data, fw->mr.addr, 8);
+
+/* Window Size */
+build_append_int_noprefix(table_data, fw->size, 8);
+
+/* Host Bridge Interleave Ways */
+build_append_int_noprefix(table_data, fw->enc_int_ways, 1);
+
+/* Host Bridge Interleave Arithmetic */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+
+/* Host Bridge Interleave Granularity */
+build_append_int_noprefix(table_data, fw->enc_int_gran, 4);
+
+/* Window Restrictions */
+build_append_int_noprefix(table_data, 0x0f, 2); /* No restrictions */
+
+/* QTG ID */
+build_append_int_noprefix(table_data, 0, 2);
+
+/* Host Bridge List (list of UIDs - currently bus_nr) */
+for (i = 0; i < fw->num_targets; i++) {
+g_assert(fw->target_hbs[i]);
+build_append_int_noprefix(table_data, fw->target_hbs[i]->bus_nr, 
4);
+}
+}
+}
+
 static int cxl_foreach_pxb_hb(Object *obj, void *opaque)
 {
 Aml *cedt = opaque;
@@ -86,6 +144,7 @@ void cxl_build_cedt(MachineState *ms, GArray *table_offsets, 
GArray *table_data,
 /* reserve space for CEDT header */
 
 object_child_foreach_recursive(object_get_root(), cxl_foreach_pxb_hb, 
cedt);
+cedt_build_cfmws(cedt->buf, ms);
 
 /* copy AML table into ACPI tables blob and patch header there */
 g_array_append_vals(table_data, cedt->buf->data, cedt->buf->len);
-- 
2.32.0

[PATCH v9 13/45] cxl: Machine level control on whether CXL support is enabled

2022-04-04 Thread Jonathan Cameron via

From: Jonathan Cameron 

There are going to be some potential overheads to CXL enablement,
for example the host bridge region reserved in memory maps.
Add a machine level control so that CXL is disabled by default.

Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/core/machine.c| 28 
 hw/i386/pc.c |  1 +
 include/hw/boards.h  |  2 ++
 include/hw/cxl/cxl.h |  4 
 4 files changed, 35 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index d856485cb4..6ff5dba64e 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -31,6 +31,7 @@
 #include "sysemu/qtest.h"
 #include "hw/pci/pci.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/cxl/cxl.h"
 #include "migration/global_state.h"
 #include "migration/vmstate.h"
 #include "exec/confidential-guest-support.h"
@@ -545,6 +546,20 @@ static void machine_set_nvdimm_persistence(Object *obj, 
const char *value,
 nvdimms_state->persistence_string = g_strdup(value);
 }
 
+static bool machine_get_cxl(Object *obj, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+return ms->cxl_devices_state->is_enabled;
+}
+
+static void machine_set_cxl(Object *obj, bool value, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+ms->cxl_devices_state->is_enabled = value;
+}
+
 void machine_class_allow_dynamic_sysbus_dev(MachineClass *mc, const char *type)
 {
 QAPI_LIST_PREPEND(mc->allowed_dynamic_sysbus_devices, g_strdup(type));
@@ -777,6 +792,8 @@ static void machine_class_init(ObjectClass *oc, void *data)
 mc->default_ram_size = 128 * MiB;
 mc->rom_file_has_mr = true;
 
+/* Few machines support CXL, so default to off */
+mc->cxl_supported = false;
 /* numa node memory size aligned on 8MB by default.
  * On Linux, each node's border has to be 8MB aligned
  */
@@ -922,6 +939,16 @@ static void machine_initfn(Object *obj)
 "Valid values are cpu, mem-ctrl");
 }
 
+if (mc->cxl_supported) {
+Object *obj = OBJECT(ms);
+
+ms->cxl_devices_state = g_new0(CXLState, 1);
+object_property_add_bool(obj, "cxl", machine_get_cxl, machine_set_cxl);
+object_property_set_description(obj, "cxl",
+"Set on/off to enable/disable "
+"CXL instantiation");
+}
+
 if (mc->cpu_index_to_instance_props && mc->get_default_cpu_node_id) {
 ms->numa_state = g_new0(NumaState, 1);
 object_property_add_bool(obj, "hmat",
@@ -956,6 +983,7 @@ static void machine_finalize(Object *obj)
 g_free(ms->device_memory);
 g_free(ms->nvdimms_state);
 g_free(ms->numa_state);
+g_free(ms->cxl_devices_state);
 }
 
 bool machine_usb(MachineState *machine)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index fd55fc725c..e2849fc741 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1758,6 +1758,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
 mc->nvdimm_supported = true;
 mc->smp_props.dies_supported = true;
+mc->cxl_supported = true;
 mc->default_ram_id = "pc.ram";
 
 object_class_property_add(oc, PC_MACHINE_MAX_RAM_BELOW_4G, "size",
diff --git a/include/hw/boards.h b/include/hw/boards.h
index c92ac8815c..680718dafc 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -269,6 +269,7 @@ struct MachineClass {
 bool ignore_boot_device_suffixes;
 bool smbus_no_migration_support;
 bool nvdimm_supported;
+bool cxl_supported;
 bool numa_mem_supported;
 bool auto_enable_numa;
 SMPCompatProps smp_props;
@@ -360,6 +361,7 @@ struct MachineState {
 CPUArchIdList *possible_cpus;
 CpuTopology smp;
 struct NVDIMMState *nvdimms_state;
+struct CXLState *cxl_devices_state;
 struct NumaState *numa_state;
 };
 
diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
index 554ad93b6b..31af92fd5e 100644
--- a/include/hw/cxl/cxl.h
+++ b/include/hw/cxl/cxl.h
@@ -17,4 +17,8 @@
 #define CXL_COMPONENT_REG_BAR_IDX 0
 #define CXL_DEVICE_REG_BAR_IDX 2
 
+typedef struct CXLState {
+bool is_enabled;
+} CXLState;
+
 #endif
-- 
2.32.0

[PATCH v9 22/45] qtests/cxl: Add initial root port and CXL type3 tests

2022-04-04 Thread Jonathan Cameron via

At this stage we can boot configurations with host bridges,
root ports and type 3 memory devices, so add appropriate
tests.

Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 tests/qtest/cxl-test.c | 126 +
 1 file changed, 126 insertions(+)

diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
index 1006c8ae4e..5f0794e816 100644
--- a/tests/qtest/cxl-test.c
+++ b/tests/qtest/cxl-test.c
@@ -8,6 +8,54 @@
 #include "qemu/osdep.h"
 #include "libqtest-single.h"
 
+#define QEMU_PXB_CMD "-machine q35,cxl=on " \
+ "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 "
+
+#define QEMU_2PXB_CMD "-machine q35,cxl=on " \
+  "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 "  \
+  "-device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=53 "
+
+#define QEMU_RP "-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 "
+
+/* Dual ports on first pxb */
+#define QEMU_2RP "-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 " \
+ "-device cxl-rp,id=rp1,bus=cxl.0,chassis=0,slot=1 "
+
+/* Dual ports on each of the pxb instances */
+#define QEMU_4RP "-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 " \
+ "-device cxl-rp,id=rp1,bus=cxl.0,chassis=0,slot=1 " \
+ "-device cxl-rp,id=rp2,bus=cxl.1,chassis=0,slot=2 " \
+ "-device cxl-rp,id=rp3,bus=cxl.1,chassis=0,slot=3 "
+
+#define QEMU_T3D "-object 
memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
+ "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "  
  \
+ "-device 
cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 "
+
+#define QEMU_2T3D "-object 
memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M "\
+  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M " 
   \
+  "-device 
cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 " \
+  "-object 
memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "\
+  "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M " 
   \
+  "-device 
cxl-type3,bus=rp1,memdev=cxl-mem1,lsa=lsa1,id=cxl-pmem1 "
+
+#define QEMU_4T3D "-object 
memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
+  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M " 
   \
+  "-device 
cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 " \
+  "-object 
memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "\
+  "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M " 
   \
+  "-device 
cxl-type3,bus=rp1,memdev=cxl-mem1,lsa=lsa1,id=cxl-pmem1 " \
+  "-object 
memory-backend-file,id=cxl-mem2,mem-path=%s,size=256M "\
+  "-object memory-backend-file,id=lsa2,mem-path=%s,size=256M " 
   \
+  "-device 
cxl-type3,bus=rp2,memdev=cxl-mem2,lsa=lsa2,id=cxl-pmem2 " \
+  "-object 
memory-backend-file,id=cxl-mem3,mem-path=%s,size=256M "\
+  "-object memory-backend-file,id=lsa3,mem-path=%s,size=256M " 
   \
+  "-device 
cxl-type3,bus=rp3,memdev=cxl-mem3,lsa=lsa3,id=cxl-pmem3 "
+
+static void cxl_basic_hb(void)
+{
+qtest_start("-machine q35,cxl=on");
+qtest_end();
+}
 
 static void cxl_basic_pxb(void)
 {
@@ -15,9 +63,87 @@ static void cxl_basic_pxb(void)
 qtest_end();
 }
 
+static void cxl_pxb_with_window(void)
+{
+qtest_start(QEMU_PXB_CMD);
+qtest_end();
+}
+
+static void cxl_2pxb_with_window(void)
+{
+qtest_start(QEMU_2PXB_CMD);
+qtest_end();
+}
+
+static void cxl_root_port(void)
+{
+qtest_start(QEMU_PXB_CMD QEMU_RP);
+qtest_end();
+}
+
+static void cxl_2root_port(void)
+{
+qtest_start(QEMU_PXB_CMD QEMU_2RP);
+qtest_end();
+}
+
+static void cxl_t3d(void)
+{
+g_autoptr(GString) cmdline = g_string_new(NULL);
+char template[] = "/tmp/cxl-test-XX";
+const char *tmpfs;
+
+tmpfs = mkdtemp(template);
+
+g_string_printf(cmdline, QEMU_PXB_CMD QEMU_RP QEMU_T3D, tmpfs, tmpfs);
+
+qtest_start(cmdline->str);
+qtest_end();
+}
+
+static void cxl_1pxb_2rp_2t3d(void)
+{
+g_autoptr(GString) cmdline = g_string_new(NULL);
+char template[] = "/tmp/cxl-test-XX";
+const char *tmpfs;
+
+tmpfs = mkdtemp(template);
+
+g_string_printf(cmdline, QEMU_PXB_CMD QEMU_2RP QEMU_2T3D,
+tmpfs, tmpfs, tmpfs, tmpfs);
+
+qtest_start(cmdline->str);
+qtest_end();
+}
+
+static void cxl_2pxb_4rp_4t3d(void)
+{
+g_autoptr(GString) cmdline = g_string_new(NULL);
+char template[] = "/tmp/cxl-test-XX";
+const char *tmpfs;
+
+tmpfs = mkdtemp(template);
+
+g_string_printf(cmdline, QEMU_2PXB_CMD QEMU_4RP QEMU_4T3D,
+tmpfs, tmpfs, tmpfs, tmpfs, tmpfs, tmpfs,
+tmpfs, tmpfs);
+
+qtest_start(cmdline->str);
+qtest_end();
+}
+
 int

[PATCH v9 26/45] hw/cxl/component: Add utils for interleave parameter encoding/decoding

2022-04-04 Thread Jonathan Cameron via

From: Jonathan Cameron 

Both registers and the CFMWS entries in CDAT use simple encodings
for the number of interleave ways and the interleave granularity.
Introduce simple conversion functions to/from the unencoded
number / size.  So far the iw decode has not been needed so is
it not implemented.

Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/cxl/cxl-component-utils.c   | 34 ++
 include/hw/cxl/cxl_component.h |  8 
 2 files changed, 42 insertions(+)

diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
index 22e52cef17..1a1adbd4cb 100644
--- a/hw/cxl/cxl-component-utils.c
+++ b/hw/cxl/cxl-component-utils.c
@@ -9,6 +9,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/log.h"
+#include "qapi/error.h"
 #include "hw/pci/pci.h"
 #include "hw/cxl/cxl.h"
 
@@ -223,3 +224,36 @@ void cxl_component_create_dvsec(CXLComponentState *cxl, 
uint16_t length,
 range_init_nofail(>dvsecs[type], cxl->dvsec_offset, length);
 cxl->dvsec_offset += length;
 }
+
+uint8_t cxl_interleave_ways_enc(int iw, Error **errp)
+{
+switch (iw) {
+case 1: return 0x0;
+case 2: return 0x1;
+case 4: return 0x2;
+case 8: return 0x3;
+case 16: return 0x4;
+case 3: return 0x8;
+case 6: return 0x9;
+case 12: return 0xa;
+default:
+error_setg(errp, "Interleave ways: %d not supported", iw);
+return 0;
+}
+}
+
+uint8_t cxl_interleave_granularity_enc(uint64_t gran, Error **errp)
+{
+switch (gran) {
+case 256: return 0;
+case 512: return 1;
+case 1024: return 2;
+case 2048: return 3;
+case 4096: return 4;
+case 8192: return 5;
+case 16384: return 6;
+default:
+error_setg(errp, "Interleave granularity: %" PRIu64 " invalid", gran);
+return 0;
+}
+}
diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h
index 5b15bd6c3f..b0f95d3484 100644
--- a/include/hw/cxl/cxl_component.h
+++ b/include/hw/cxl/cxl_component.h
@@ -194,4 +194,12 @@ void cxl_component_register_init_common(uint32_t 
*reg_state,
 void cxl_component_create_dvsec(CXLComponentState *cxl_cstate, uint16_t length,
 uint16_t type, uint8_t rev, uint8_t *body);
 
+uint8_t cxl_interleave_ways_enc(int iw, Error **errp);
+uint8_t cxl_interleave_granularity_enc(uint64_t gran, Error **errp);
+
+static inline hwaddr cxl_decode_ig(int ig)
+{
+return 1 << (ig + 8);
+}
+
 #endif
-- 
2.32.0

[PATCH v9 16/45] hw/cxl/rp: Add a root port

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

This adds just enough of a root port implementation to be able to
enumerate root ports (creating the required DVSEC entries). What's not
here yet is the MMIO nor the ability to write some of the DVSEC entries.

This can be added with the qemu commandline by adding a rootport to a
specific CXL host bridge. For example:
  -device cxl-rp,id=rp0,bus="cxl.0",addr=0.0,chassis=4

Like the host bridge patch, the ACPI tables aren't generated at this
point and so system software cannot use it.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/pci-bridge/Kconfig  |   5 +
 hw/pci-bridge/cxl_root_port.c  | 231 +
 hw/pci-bridge/meson.build  |   1 +
 hw/pci-bridge/pcie_root_port.c |   6 +-
 hw/pci/pci.c   |   4 +-
 5 files changed, 245 insertions(+), 2 deletions(-)

diff --git a/hw/pci-bridge/Kconfig b/hw/pci-bridge/Kconfig
index f8df4315ba..02614f49aa 100644
--- a/hw/pci-bridge/Kconfig
+++ b/hw/pci-bridge/Kconfig
@@ -27,3 +27,8 @@ config DEC_PCI
 
 config SIMBA
 bool
+
+config CXL
+bool
+default y if PCI_EXPRESS && PXB
+depends on PCI_EXPRESS && MSI_NONBROKEN && PXB
diff --git a/hw/pci-bridge/cxl_root_port.c b/hw/pci-bridge/cxl_root_port.c
new file mode 100644
index 00..dfbf59ceb3
--- /dev/null
+++ b/hw/pci-bridge/cxl_root_port.c
@@ -0,0 +1,231 @@
+/*
+ * CXL 2.0 Root Port Implementation
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/range.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pcie_port.h"
+#include "hw/qdev-properties.h"
+#include "hw/sysbus.h"
+#include "qapi/error.h"
+#include "hw/cxl/cxl.h"
+
+#define CXL_ROOT_PORT_DID 0x7075
+
+/* Copied from the gen root port which we derive */
+#define GEN_PCIE_ROOT_PORT_AER_OFFSET 0x100
+#define GEN_PCIE_ROOT_PORT_ACS_OFFSET \
+(GEN_PCIE_ROOT_PORT_AER_OFFSET + PCI_ERR_SIZEOF)
+#define CXL_ROOT_PORT_DVSEC_OFFSET \
+(GEN_PCIE_ROOT_PORT_ACS_OFFSET + PCI_ACS_SIZEOF)
+
+typedef struct CXLRootPort {
+/*< private >*/
+PCIESlot parent_obj;
+
+CXLComponentState cxl_cstate;
+PCIResReserve res_reserve;
+} CXLRootPort;
+
+#define TYPE_CXL_ROOT_PORT "cxl-rp"
+DECLARE_INSTANCE_CHECKER(CXLRootPort, CXL_ROOT_PORT, TYPE_CXL_ROOT_PORT)
+
+static void latch_registers(CXLRootPort *crp)
+{
+uint32_t *reg_state = crp->cxl_cstate.crb.cache_mem_registers;
+
+cxl_component_register_init_common(reg_state, CXL2_ROOT_PORT);
+}
+
+static void build_dvsecs(CXLComponentState *cxl)
+{
+uint8_t *dvsec;
+
+dvsec = (uint8_t *)&(CXLDVSECPortExtensions){ 0 };
+cxl_component_create_dvsec(cxl, EXTENSIONS_PORT_DVSEC_LENGTH,
+   EXTENSIONS_PORT_DVSEC,
+   EXTENSIONS_PORT_DVSEC_REVID, dvsec);
+
+dvsec = (uint8_t *)&(CXLDVSECPortGPF){
+.rsvd= 0,
+.phase1_ctrl = 1, /* 1μs timeout */
+.phase2_ctrl = 1, /* 1μs timeout */
+};
+cxl_component_create_dvsec(cxl, GPF_PORT_DVSEC_LENGTH, GPF_PORT_DVSEC,
+   GPF_PORT_DVSEC_REVID, dvsec);
+
+dvsec = (uint8_t *)&(CXLDVSECPortFlexBus){
+.cap = 0x26, /* IO, Mem, non-MLD */
+.ctrl= 0x2,
+.status  = 0x26, /* same */
+.rcvd_mod_ts_data_phase1 = 0xef,
+};
+cxl_component_create_dvsec(cxl, PCIE_FLEXBUS_PORT_DVSEC_LENGTH_2_0,
+   PCIE_FLEXBUS_PORT_DVSEC,
+   PCIE_FLEXBUS_PORT_DVSEC_REVID_2_0, dvsec);
+
+dvsec = (uint8_t *)&(CXLDVSECRegisterLocator){
+.rsvd = 0,
+.reg0_base_lo = RBI_COMPONENT_REG | CXL_COMPONENT_REG_BAR_IDX,
+.reg0_base_hi = 0,
+};
+cxl_component_create_dvsec(cxl, REG_LOC_DVSEC_LENGTH, REG_LOC_DVSEC,
+   REG_LOC_DVSEC_REVID, dvsec);
+}
+
+static void cxl_rp_realize(DeviceState *dev, Error **errp)
+{
+PCIDevice *pci_dev = PCI_DEVICE(dev);
+PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev);
+CXLRootPort *crp   = CXL_ROOT_PORT(dev);
+CXLComponentState *cxl_cstate = >cxl_cstate;
+ComponentRegisters *cregs = _cstate->crb;
+MemoryRegion *component_bar =

[PATCH v9 25/45] acpi/cxl: Create the CEDT (9.14.1)

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

The CXL Early Discovery Table is defined in the CXL 2.0 specification as
a way for the OS to get CXL specific information from the system
firmware.

CXL 2.0 specification adds an _HID, ACPI0016, for CXL capable host
bridges, with a _CID of PNP0A08 (PCIe host bridge). CXL aware software
is able to use this initiate the proper _OSC method, and get the _UID
which is referenced by the CEDT. Therefore the existence of an ACPI0016
device allows a CXL aware driver perform the necessary actions. For a
CXL capable OS, this works. For a CXL unaware OS, this works.

CEDT awaremess requires more. The motivation for ACPI0017 is to provide
the possibility of having a Linux CXL module that can work on a legacy
Linux kernel. Linux core PCI/ACPI which won't be built as a module,
will see the _CID of PNP0A08 and bind a driver to it. If we later loaded
a driver for ACPI0016, Linux won't be able to bind it to the hardware
because it has already bound the PNP0A08 driver. The ACPI0017 device is
an opportunity to have an object to bind a driver will be used by a
Linux driver to walk the CXL topology and do everything that we would
have preferred to do with ACPI0016.

There is another motivation for an ACPI0017 device which isn't
implemented here. An operating system needs an attach point for a
non-volatile region provider that understands cross-hostbridge
interleaving. Since QEMU emulation doesn't support interleaving yet,
this is more important on the OS side, for now.

As of CXL 2.0 spec, only 1 sub structure is defined, the CXL Host Bridge
Structure (CHBS) which is primarily useful for telling the OS exactly
where the MMIO for the host bridge is.

Link: 
https://lore.kernel.org/linux-cxl/20210115034911.nkgpzc756d6qm...@intel.com/T/#t
Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/acpi/cxl.c   | 68 +
 hw/i386/acpi-build.c| 27 
 hw/pci-bridge/pci_expander_bridge.c | 17 
 include/hw/acpi/cxl.h   |  5 +++
 include/hw/pci/pci_bridge.h | 20 +
 5 files changed, 120 insertions(+), 17 deletions(-)

diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
index ca1f04f359..aa4af86a4c 100644
--- a/hw/acpi/cxl.c
+++ b/hw/acpi/cxl.c
@@ -18,7 +18,11 @@
  */
 
 #include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pci_host.h"
 #include "hw/cxl/cxl.h"
+#include "hw/mem/memory-device.h"
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/aml-build.h"
 #include "hw/acpi/bios-linker-loader.h"
@@ -26,6 +30,70 @@
 #include "qapi/error.h"
 #include "qemu/uuid.h"
 
+static void cedt_build_chbs(GArray *table_data, PXBDev *cxl)
+{
+SysBusDevice *sbd = SYS_BUS_DEVICE(cxl->cxl.cxl_host_bridge);
+struct MemoryRegion *mr = sbd->mmio[0].memory;
+
+/* Type */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Record Length */
+build_append_int_noprefix(table_data, 32, 2);
+
+/* UID - currently equal to bus number */
+build_append_int_noprefix(table_data, cxl->bus_nr, 4);
+
+/* Version */
+build_append_int_noprefix(table_data, 1, 4);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+
+/* Base - subregion within a container that is in PA space */
+build_append_int_noprefix(table_data, mr->container->addr + mr->addr, 8);
+
+/* Length */
+build_append_int_noprefix(table_data, memory_region_size(mr), 8);
+}
+
+static int cxl_foreach_pxb_hb(Object *obj, void *opaque)
+{
+Aml *cedt = opaque;
+
+if (object_dynamic_cast(obj, TYPE_PXB_CXL_DEVICE)) {
+cedt_build_chbs(cedt->buf, PXB_CXL_DEV(obj));
+}
+
+return 0;
+}
+
+void cxl_build_cedt(MachineState *ms, GArray *table_offsets, GArray 
*table_data,
+BIOSLinker *linker, const char *oem_id,
+const char *oem_table_id)
+{
+Aml *cedt;
+AcpiTable table = { .sig = "CEDT", .rev = 1, .oem_id = oem_id,
+.oem_table_id = oem_table_id };
+
+acpi_add_table(table_offsets, table_data);
+acpi_table_begin(, table_data);
+cedt = init_aml_allocator();
+
+/* reserve space for CEDT header */
+
+object_child_foreach_recursive(object_get_root(), cxl_foreach_pxb_hb, 
cedt);
+
+/* copy AML table into ACPI tables blob and patch header there */
+g_array_append_vals(table_data, cedt->buf->data, cedt->buf->len);
+free_aml_allocator();
+
+acpi_table_end(linker, );
+}
+
 static Aml *__build_cxl_osc_method(void)
 {
 Aml *method, *if_uuid, *else_uuid, *if_arg1_not_1, *if_cxl, 
*if_caps_masked;
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 59ede8b2e9..c125939ed6 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -77,6 +77,7 @@
 #include "hw/acpi/ipmi.h"
 #include "hw/acpi/hmat.h"
 #include "hw/acpi/viot.h"
+#include "hw/acpi/cxl.h"

Re: [PATCH v1 7/9] colo-compare: safe finalization

2022-04-04 Thread Maxim Davydov

The main problem that if we call object_new_with_class() and then 
object_unref(), it fails. First of all, this is due to the fact that 
finalize expects that net/colo-compare.c:colo_compare_complete() has 
been called before.


On 3/30/22 17:54, Vladimir Sementsov-Ogievskiy wrote:

29.03.2022 00:15, Maxim Davydov wrote:

Fixes some possible issues with finalization. For example, finalization
immediately after instance_init fails on the assert.

Signed-off-by: Maxim Davydov 
---
  net/colo-compare.c | 25 -
  1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 62554b5b3c..81d8de0aaa 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -1426,7 +1426,7 @@ static void colo_compare_finalize(Object *obj)
  break;
  }
  }
-    if (QTAILQ_EMPTY(_compares)) {
if colo_compare_active == false, event_mtx and event_complete_cond 
didn't inited in colo_compare_complete()

+    if (QTAILQ_EMPTY(_compares) && colo_compare_active) {
  colo_compare_active = false;
  qemu_mutex_destroy(_mtx);
  qemu_cond_destroy(_complete_cond);
@@ -1442,19 +1442,26 @@ static void colo_compare_finalize(Object *obj)
    colo_compare_timer_del(s);
  -    qemu_bh_delete(s->event_bh);
s->event_bh wasn't allocated in colo_compare_iothread() in 
colo_compare_complete()

+    if (s->event_bh) {
+    qemu_bh_delete(s->event_bh);
+    }
  -    AioContext *ctx = iothread_get_aio_context(s->iothread);
-    aio_context_acquire(ctx);
-    AIO_WAIT_WHILE(ctx, !s->out_sendco.done);
-    if (s->notify_dev) {
-    AIO_WAIT_WHILE(ctx, !s->notify_sendco.done);
s->iothread == NULL after .instance_init (it can be detected in 
colo_compare_complete(), if it has been called)

+    if (s->iothread) {
+    AioContext *ctx = iothread_get_aio_context(s->iothread);
+    aio_context_acquire(ctx);
+    AIO_WAIT_WHILE(ctx, !s->out_sendco.done);
+    if (s->notify_dev) {
+    AIO_WAIT_WHILE(ctx, !s->notify_sendco.done);
+    }
+    aio_context_release(ctx);
  }
-    aio_context_release(ctx);
    /* Release all unhandled packets after compare thead exited */
  g_queue_foreach(>conn_list, colo_flush_packets, s);
-    AIO_WAIT_WHILE(NULL, !s->out_sendco.done);
In normal situation, it flushes all packets and sets s->out_sendco.done 
= true via compare_chr_send (we wait this event). But s->conn_list isn't 
initialized, s->out_sendco.done == false and won't become true. So, it's 
infinite waiting.

+    /* Without colo_compare_complete done == false without packets */
+    if (!g_queue_is_empty(>out_sendco.send_list)) {
+    AIO_WAIT_WHILE(NULL, !s->out_sendco.done);
+    }


I think, would be good to add more description for this last change. 
It's not as obvious as previous two changes.



g_queue_clear(>conn_list);
  g_queue_clear(>out_sendco.send_list);




--
Best regards,
Maxim Davydov

[PATCH v9 19/45] hw/cxl/device: Add some trivial commands

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

GET_FW_INFO and GET_PARTITION_INFO, for this emulation, is equivalent to
info already returned in the IDENTIFY command. To have a more robust
implementation, add those.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-mailbox-utils.c | 69 ++
 1 file changed, 69 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 4ae0561dfc..c8188d7087 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -10,6 +10,7 @@
 #include "qemu/osdep.h"
 #include "hw/cxl/cxl.h"
 #include "hw/pci/pci.h"
+#include "qemu/cutils.h"
 #include "qemu/log.h"
 #include "qemu/uuid.h"
 
@@ -44,6 +45,8 @@ enum {
 #define CLEAR_RECORDS   0x1
 #define GET_INTERRUPT_POLICY   0x2
 #define SET_INTERRUPT_POLICY   0x3
+FIRMWARE_UPDATE = 0x02,
+#define GET_INFO  0x0
 TIMESTAMP   = 0x03,
 #define GET   0x0
 #define SET   0x1
@@ -52,6 +55,8 @@ enum {
 #define GET_LOG   0x1
 IDENTIFY= 0x40,
 #define MEMORY_DEVICE 0x0
+CCLS= 0x41,
+#define GET_PARTITION_INFO 0x0
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -114,6 +119,39 @@ DEFINE_MAILBOX_HANDLER_NOP(events_clear_records);
 DEFINE_MAILBOX_HANDLER_ZEROED(events_get_interrupt_policy, 4);
 DEFINE_MAILBOX_HANDLER_NOP(events_set_interrupt_policy);
 
+/* 8.2.9.2.1 */
+static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
+ CXLDeviceState *cxl_dstate,
+ uint16_t *len)
+{
+struct {
+uint8_t slots_supported;
+uint8_t slot_info;
+uint8_t caps;
+uint8_t rsvd[0xd];
+char fw_rev1[0x10];
+char fw_rev2[0x10];
+char fw_rev3[0x10];
+char fw_rev4[0x10];
+} QEMU_PACKED *fw_info;
+QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
+
+if (cxl_dstate->pmem_size < (256 << 20)) {
+return CXL_MBOX_INTERNAL_ERROR;
+}
+
+fw_info = (void *)cmd->payload;
+memset(fw_info, 0, sizeof(*fw_info));
+
+fw_info->slots_supported = 2;
+fw_info->slot_info = BIT(0) | BIT(3);
+fw_info->caps = 0;
+pstrcpy(fw_info->fw_rev1, sizeof(fw_info->fw_rev1), "BWFW VERSION 0");
+
+*len = sizeof(*fw_info);
+return CXL_MBOX_SUCCESS;
+}
+
 /* 8.2.9.3.1 */
 static ret_code cmd_timestamp_get(struct cxl_cmd *cmd,
   CXLDeviceState *cxl_dstate,
@@ -258,6 +296,33 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd 
*cmd,
 return CXL_MBOX_SUCCESS;
 }
 
+static ret_code cmd_ccls_get_partition_info(struct cxl_cmd *cmd,
+   CXLDeviceState *cxl_dstate,
+   uint16_t *len)
+{
+struct {
+uint64_t active_vmem;
+uint64_t active_pmem;
+uint64_t next_vmem;
+uint64_t next_pmem;
+} QEMU_PACKED *part_info = (void *)cmd->payload;
+QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
+uint64_t size = cxl_dstate->pmem_size;
+
+if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
+return CXL_MBOX_INTERNAL_ERROR;
+}
+
+/* PMEM only */
+part_info->active_vmem = 0;
+part_info->next_vmem = 0;
+part_info->active_pmem = size / (256 << 20);
+part_info->next_pmem = 0;
+
+*len = sizeof(*part_info);
+return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
@@ -271,12 +336,16 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_events_get_interrupt_policy, 0, 0 },
 [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
 cmd_events_set_interrupt_policy, 4, IMMEDIATE_CONFIG_CHANGE },
+[FIRMWARE_UPDATE][GET_INFO] = { "FIRMWARE_UPDATE_GET_INFO",
+cmd_firmware_update_get_info, 0, 0 },
 [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
 [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8, 
IMMEDIATE_POLICY_CHANGE },
 [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported, 0, 
0 },
 [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
 [IDENTIFY][MEMORY_DEVICE] = { "IDENTIFY_MEMORY_DEVICE",
 cmd_identify_memory_device, 0, 0 },
+[CCLS][GET_PARTITION_INFO] = { "CCLS_GET_PARTITION_INFO",
+cmd_ccls_get_partition_info, 0, 0 },
 };
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
-- 
2.32.0

[PATCH v9 14/45] hw/pxb: Allow creation of a CXL PXB (host bridge)

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

This works like adding a typical pxb device, except the name is
'pxb-cxl' instead of 'pxb-pcie'. An example command line would be as
follows:
  -device pxb-cxl,id=cxl.0,bus="pcie.0",bus_nr=1

A CXL PXB is backward compatible with PCIe. What this means in practice
is that an operating system that is unaware of CXL should still be able
to enumerate this topology as if it were PCIe.

One can create multiple CXL PXB host bridges, but a host bridge can only
be connected to the main root bus. Host bridges cannot appear elsewhere
in the topology.

Note that as of this patch, the ACPI tables needed for the host bridge
(specifically, an ACPI object in _SB named ACPI0016 and the CEDT) aren't
created. So while this patch internally creates it, it cannot be
properly used by an operating system or other system software.

Also necessary is to add an exception to scripts/device-crash-test
similar to that for exiting pxb as both must created on a PCIexpress
host bus.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan.Cameron 
Reviewed-by: Alex Bennée 
---
 hw/pci-bridge/pci_expander_bridge.c | 86 -
 hw/pci/pci.c|  7 +++
 include/hw/pci/pci.h|  6 ++
 scripts/device-crash-test   |  1 +
 4 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index a6caa1e7b5..f762eb4a6e 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -17,6 +17,7 @@
 #include "hw/pci/pci_host.h"
 #include "hw/qdev-properties.h"
 #include "hw/pci/pci_bridge.h"
+#include "hw/cxl/cxl.h"
 #include "qemu/range.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
@@ -56,6 +57,16 @@ DECLARE_INSTANCE_CHECKER(PXBDev, PXB_DEV,
 DECLARE_INSTANCE_CHECKER(PXBDev, PXB_PCIE_DEV,
  TYPE_PXB_PCIE_DEVICE)
 
+#define TYPE_PXB_CXL_DEVICE "pxb-cxl"
+DECLARE_INSTANCE_CHECKER(PXBDev, PXB_CXL_DEV,
+ TYPE_PXB_CXL_DEVICE)
+
+typedef struct CXLHost {
+PCIHostState parent_obj;
+
+CXLComponentState cxl_cstate;
+} CXLHost;
+
 struct PXBDev {
 /*< private >*/
 PCIDevice parent_obj;
@@ -68,6 +79,11 @@ struct PXBDev {
 
 static PXBDev *convert_to_pxb(PCIDevice *dev)
 {
+/* A CXL PXB's parent bus is PCIe, so the normal check won't work */
+if (object_dynamic_cast(OBJECT(dev), TYPE_PXB_CXL_DEVICE)) {
+return PXB_CXL_DEV(dev);
+}
+
 return pci_bus_is_express(pci_get_bus(dev))
 ? PXB_PCIE_DEV(dev) : PXB_DEV(dev);
 }
@@ -112,11 +128,20 @@ static const TypeInfo pxb_pcie_bus_info = {
 .class_init= pxb_bus_class_init,
 };
 
+static const TypeInfo pxb_cxl_bus_info = {
+.name  = TYPE_PXB_CXL_BUS,
+.parent= TYPE_CXL_BUS,
+.instance_size = sizeof(PXBBus),
+.class_init= pxb_bus_class_init,
+};
+
 static const char *pxb_host_root_bus_path(PCIHostState *host_bridge,
   PCIBus *rootbus)
 {
-PXBBus *bus = pci_bus_is_express(rootbus) ?
-  PXB_PCIE_BUS(rootbus) : PXB_BUS(rootbus);
+PXBBus *bus = pci_bus_is_cxl(rootbus) ?
+  PXB_CXL_BUS(rootbus) :
+  pci_bus_is_express(rootbus) ? PXB_PCIE_BUS(rootbus) :
+PXB_BUS(rootbus);
 
 snprintf(bus->bus_path, 8, ":%02x", pxb_bus_num(rootbus));
 return bus->bus_path;
@@ -218,6 +243,10 @@ static int pxb_map_irq_fn(PCIDevice *pci_dev, int pin)
 return pin - PCI_SLOT(pxb->devfn);
 }
 
+static void pxb_dev_reset(DeviceState *dev)
+{
+}
+
 static gint pxb_compare(gconstpointer a, gconstpointer b)
 {
 const PXBDev *pxb_a = a, *pxb_b = b;
@@ -389,13 +418,66 @@ static const TypeInfo pxb_pcie_dev_info = {
 },
 };
 
+static void pxb_cxl_dev_realize(PCIDevice *dev, Error **errp)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+
+/* A CXL PXB's parent bus is still PCIe */
+if (!pci_bus_is_express(pci_get_bus(dev))) {
+error_setg(errp, "pxb-cxl devices cannot reside on a PCI bus");
+return;
+}
+if (!ms->cxl_devices_state->is_enabled) {
+error_setg(errp, "Machine does not have cxl=on");
+return;
+}
+
+pxb_dev_realize_common(dev, CXL, errp);
+pxb_dev_reset(DEVICE(dev));
+}
+
+static void pxb_cxl_dev_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc   = DEVICE_CLASS(klass);
+PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+k->realize = pxb_cxl_dev_realize;
+k->exit= pxb_dev_exitfn;
+/*
+ * XXX: These types of bridges don't actually show up in the hierarchy so
+ * vendor, device, class, etc. ids are intentionally left out.
+ */
+
+dc->desc = "CXL Host Bridge";
+device_class_set_props(dc, pxb_dev_properties);
+set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
+
+/* Host bridges aren't

[PATCH v9 18/45] hw/cxl/device: Implement MMIO HDM decoding (8.2.5.12)

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

A device's volatile and persistent memory are known Host Defined Memory
(HDM) regions. The mechanism by which the device is programmed to claim
the addresses associated with those regions is through dedicated logic
known as the HDM decoder. In order to allow the OS to properly program
the HDMs, the HDM decoders must be modeled.

There are two ways the HDM decoders can be implemented, the legacy
mechanism is through the PCIe DVSEC programming from CXL 1.1 (8.1.3.8),
and MMIO is found in 8.2.5.12 of the spec. For now, 8.1.3.8 is not
implemented.

Much of CXL device logic is implemented in cxl-utils. The HDM decoder
however is implemented directly by the device implementation.
Whilst the implementation currently does no validity checks on the
encoder set up, future work will add sanity checking specific to
the type of cxl component.

Signed-off-by: Ben Widawsky 
Co-developed-by: Jonathan Cameron 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/mem/cxl_type3.c | 55 ++
 1 file changed, 55 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 329a6ea2a9..5c93fbbd9b 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -50,6 +50,48 @@ static void build_dvsecs(CXLType3Dev *ct3d)
GPF_DEVICE_DVSEC_REVID, dvsec);
 }
 
+static void hdm_decoder_commit(CXLType3Dev *ct3d, int which)
+{
+ComponentRegisters *cregs = >cxl_cstate.crb;
+uint32_t *cache_mem = cregs->cache_mem_registers;
+
+assert(which == 0);
+
+/* TODO: Sanity checks that the decoder is possible */
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMIT, 0);
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, ERR, 0);
+
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMITTED, 1);
+}
+
+static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
+   unsigned size)
+{
+CXLComponentState *cxl_cstate = opaque;
+ComponentRegisters *cregs = _cstate->crb;
+CXLType3Dev *ct3d = container_of(cxl_cstate, CXLType3Dev, cxl_cstate);
+uint32_t *cache_mem = cregs->cache_mem_registers;
+bool should_commit = false;
+int which_hdm = -1;
+
+assert(size == 4);
+g_assert(offset <= CXL2_COMPONENT_CM_REGION_SIZE);
+
+switch (offset) {
+case A_CXL_HDM_DECODER0_CTRL:
+should_commit = FIELD_EX32(value, CXL_HDM_DECODER0_CTRL, COMMIT);
+which_hdm = 0;
+break;
+default:
+break;
+}
+
+stl_le_p((uint8_t *)cache_mem + offset, value);
+if (should_commit) {
+hdm_decoder_commit(ct3d, which_hdm);
+}
+}
+
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
 MemoryRegion *mr;
@@ -93,6 +135,9 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 ct3d->cxl_cstate.pdev = pci_dev;
 build_dvsecs(ct3d);
 
+regs->special_ops = g_new0(MemoryRegionOps, 1);
+regs->special_ops->write = ct3d_reg_write;
+
 cxl_component_register_block_init(OBJECT(pci_dev), cxl_cstate,
   TYPE_CXL_TYPE3);
 
@@ -107,6 +152,15 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
  >cxl_dstate.device_registers);
 }
 
+static void ct3_exit(PCIDevice *pci_dev)
+{
+CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
+CXLComponentState *cxl_cstate = >cxl_cstate;
+ComponentRegisters *regs = _cstate->crb;
+
+g_free(regs->special_ops);
+}
+
 static void ct3d_reset(DeviceState *dev)
 {
 CXLType3Dev *ct3d = CXL_TYPE3(dev);
@@ -128,6 +182,7 @@ static void ct3_class_init(ObjectClass *oc, void *data)
 PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
 
 pc->realize = ct3_realize;
+pc->exit = ct3_exit;
 pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
 pc->vendor_id = PCI_VENDOR_ID_INTEL;
 pc->device_id = 0xd93; /* LVF for now */
-- 
2.32.0

[PATCH v9 11/45] hw/pxb: Use a type for realizing expanders

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

This opens up the possibility for more types of expanders (other than
PCI and PCIe). We'll need this to create a CXL expander.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/pci-bridge/pci_expander_bridge.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index de932286b5..d4514227a8 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -24,6 +24,8 @@
 #include "hw/boards.h"
 #include "qom/object.h"
 
+enum BusType { PCI, PCIE };
+
 #define TYPE_PXB_BUS "pxb-bus"
 typedef struct PXBBus PXBBus;
 DECLARE_INSTANCE_CHECKER(PXBBus, PXB_BUS,
@@ -221,7 +223,8 @@ static gint pxb_compare(gconstpointer a, gconstpointer b)
0;
 }
 
-static void pxb_dev_realize_common(PCIDevice *dev, bool pcie, Error **errp)
+static void pxb_dev_realize_common(PCIDevice *dev, enum BusType type,
+   Error **errp)
 {
 PXBDev *pxb = convert_to_pxb(dev);
 DeviceState *ds, *bds = NULL;
@@ -246,7 +249,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
pcie, Error **errp)
 }
 
 ds = qdev_new(TYPE_PXB_HOST);
-if (pcie) {
+if (type == PCIE) {
 bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_PCIE_BUS);
 } else {
 bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, 
TYPE_PXB_BUS);
@@ -295,7 +298,7 @@ static void pxb_dev_realize(PCIDevice *dev, Error **errp)
 return;
 }
 
-pxb_dev_realize_common(dev, false, errp);
+pxb_dev_realize_common(dev, PCI, errp);
 }
 
 static void pxb_dev_exitfn(PCIDevice *pci_dev)
@@ -348,7 +351,7 @@ static void pxb_pcie_dev_realize(PCIDevice *dev, Error 
**errp)
 return;
 }
 
-pxb_dev_realize_common(dev, true, errp);
+pxb_dev_realize_common(dev, PCIE, errp);
 }
 
 static void pxb_pcie_dev_class_init(ObjectClass *klass, void *data)
-- 
2.32.0

[PATCH v9 08/45] hw/cxl/device: Add cheap EVENTS implementation (8.2.9.1)

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

Using the previously implemented stubbed helpers, it is now possible to
easily add the missing, required commands to the implementation.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/cxl/cxl-mailbox-utils.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 2557f41f61..fb1f53f48e 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -38,6 +38,14 @@
  *  a register interface that already deals with it.
  */
 
+enum {
+EVENTS  = 0x01,
+#define GET_RECORDS   0x0
+#define CLEAR_RECORDS   0x1
+#define GET_INTERRUPT_POLICY   0x2
+#define SET_INTERRUPT_POLICY   0x3
+};
+
 /* 8.2.8.4.5.1 Command Return Codes */
 typedef enum {
 CXL_MBOX_SUCCESS = 0x0,
@@ -93,9 +101,26 @@ struct cxl_cmd {
 return CXL_MBOX_SUCCESS;  \
 }
 
+DEFINE_MAILBOX_HANDLER_ZEROED(events_get_records, 0x20);
+DEFINE_MAILBOX_HANDLER_NOP(events_clear_records);
+DEFINE_MAILBOX_HANDLER_ZEROED(events_get_interrupt_policy, 4);
+DEFINE_MAILBOX_HANDLER_NOP(events_set_interrupt_policy);
+
 static QemuUUID cel_uuid;
 
-static struct cxl_cmd cxl_cmd_set[256][256] = {};
+#define IMMEDIATE_CONFIG_CHANGE (1 << 1)
+#define IMMEDIATE_LOG_CHANGE (1 << 4)
+
+static struct cxl_cmd cxl_cmd_set[256][256] = {
+[EVENTS][GET_RECORDS] = { "EVENTS_GET_RECORDS",
+cmd_events_get_records, 1, 0 },
+[EVENTS][CLEAR_RECORDS] = { "EVENTS_CLEAR_RECORDS",
+cmd_events_clear_records, ~0, IMMEDIATE_LOG_CHANGE },
+[EVENTS][GET_INTERRUPT_POLICY] = { "EVENTS_GET_INTERRUPT_POLICY",
+cmd_events_get_interrupt_policy, 0, 0 },
+[EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
+cmd_events_set_interrupt_policy, 4, IMMEDIATE_CONFIG_CHANGE },
+};
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
 {
-- 
2.32.0

[PATCH v9 12/45] hw/pci/cxl: Create a CXL bus type

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

The easiest way to differentiate a CXL bus, and a PCIE bus is using a
flag. A CXL bus, in hardware, is backward compatible with PCIE, and
therefore the code tries pretty hard to keep them in sync as much as
possible.

The other way to implement this would be to try to cast the bus to the
correct type. This is less code and useful for debugging via simply
looking at the flags.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/pci-bridge/pci_expander_bridge.c | 9 -
 include/hw/pci/pci_bus.h| 7 +++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index d4514227a8..a6caa1e7b5 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -24,7 +24,7 @@
 #include "hw/boards.h"
 #include "qom/object.h"
 
-enum BusType { PCI, PCIE };
+enum BusType { PCI, PCIE, CXL };
 
 #define TYPE_PXB_BUS "pxb-bus"
 typedef struct PXBBus PXBBus;
@@ -35,6 +35,10 @@ DECLARE_INSTANCE_CHECKER(PXBBus, PXB_BUS,
 DECLARE_INSTANCE_CHECKER(PXBBus, PXB_PCIE_BUS,
  TYPE_PXB_PCIE_BUS)
 
+#define TYPE_PXB_CXL_BUS "pxb-cxl-bus"
+DECLARE_INSTANCE_CHECKER(PXBBus, PXB_CXL_BUS,
+ TYPE_PXB_CXL_BUS)
+
 struct PXBBus {
 /*< private >*/
 PCIBus parent_obj;
@@ -251,6 +255,9 @@ static void pxb_dev_realize_common(PCIDevice *dev, enum 
BusType type,
 ds = qdev_new(TYPE_PXB_HOST);
 if (type == PCIE) {
 bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_PCIE_BUS);
+} else if (type == CXL) {
+bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_CXL_BUS);
+bus->flags |= PCI_BUS_CXL;
 } else {
 bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, 
TYPE_PXB_BUS);
 bds = qdev_new("pci-bridge");
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 347440d42c..eb94e7e85c 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -24,6 +24,8 @@ enum PCIBusFlags {
 PCI_BUS_IS_ROOT = 0x0001,
 /* PCIe extended configuration space is accessible on this bus */
 PCI_BUS_EXTENDED_CONFIG_SPACE   = 0x0002,
+/* This is a CXL Type BUS */
+PCI_BUS_CXL = 0x0004,
 };
 
 struct PCIBus {
@@ -53,6 +55,11 @@ struct PCIBus {
 Notifier machine_done;
 };
 
+static inline bool pci_bus_is_cxl(PCIBus *bus)
+{
+return !!(bus->flags & PCI_BUS_CXL);
+}
+
 static inline bool pci_bus_is_root(PCIBus *bus)
 {
 return !!(bus->flags & PCI_BUS_IS_ROOT);
-- 
2.32.0

[PATCH v9 10/45] hw/cxl/device: Add log commands (8.2.9.4) + CEL

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

CXL specification provides for the ability to obtain logs from the
device. Logs are either spec defined, like the "Command Effects Log"
(CEL), or vendor specific. UUIDs are defined for all log types.

The CEL is a mechanism to provide information to the host about which
commands are supported. It is useful both to determine which spec'd
optional commands are supported, as well as provide a list of vendor
specified commands that might be used. The CEL is already created as
part of mailbox initialization, but here it is now exported to hosts
that use these log commands.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/cxl/cxl-mailbox-utils.c | 69 ++
 1 file changed, 69 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 4584aa31f7..db473135c7 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -47,6 +47,9 @@ enum {
 TIMESTAMP   = 0x03,
 #define GET   0x0
 #define SET   0x1
+LOGS= 0x04,
+#define GET_SUPPORTED 0x0
+#define GET_LOG   0x1
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -147,6 +150,70 @@ static ret_code cmd_timestamp_set(struct cxl_cmd *cmd,
 
 static QemuUUID cel_uuid;
 
+/* 8.2.9.4.1 */
+static ret_code cmd_logs_get_supported(struct cxl_cmd *cmd,
+   CXLDeviceState *cxl_dstate,
+   uint16_t *len)
+{
+struct {
+uint16_t entries;
+uint8_t rsvd[6];
+struct {
+QemuUUID uuid;
+uint32_t size;
+} log_entries[1];
+} QEMU_PACKED *supported_logs = (void *)cmd->payload;
+QEMU_BUILD_BUG_ON(sizeof(*supported_logs) != 0x1c);
+
+supported_logs->entries = 1;
+supported_logs->log_entries[0].uuid = cel_uuid;
+supported_logs->log_entries[0].size = 4 * cxl_dstate->cel_size;
+
+*len = sizeof(*supported_logs);
+return CXL_MBOX_SUCCESS;
+}
+
+/* 8.2.9.4.2 */
+static ret_code cmd_logs_get_log(struct cxl_cmd *cmd,
+ CXLDeviceState *cxl_dstate,
+ uint16_t *len)
+{
+struct {
+QemuUUID uuid;
+uint32_t offset;
+uint32_t length;
+} QEMU_PACKED QEMU_ALIGNED(16) *get_log = (void *)cmd->payload;
+
+/*
+ * 8.2.9.4.2
+ *   The device shall return Invalid Parameter if the Offset or Length
+ *   fields attempt to access beyond the size of the log as reported by Get
+ *   Supported Logs.
+ *
+ * XXX: Spec is wrong, "Invalid Parameter" isn't a thing.
+ * XXX: Spec doesn't address incorrect UUID incorrectness.
+ *
+ * The CEL buffer is large enough to fit all commands in the emulation, so
+ * the only possible failure would be if the mailbox itself isn't big
+ * enough.
+ */
+if (get_log->offset + get_log->length > cxl_dstate->payload_size) {
+return CXL_MBOX_INVALID_INPUT;
+}
+
+if (!qemu_uuid_is_equal(_log->uuid, _uuid)) {
+return CXL_MBOX_UNSUPPORTED;
+}
+
+/* Store off everything to local variables so we can wipe out the payload 
*/
+*len = get_log->length;
+
+memmove(cmd->payload, cxl_dstate->cel_log + get_log->offset,
+   get_log->length);
+
+return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
@@ -162,6 +229,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_events_set_interrupt_policy, 4, IMMEDIATE_CONFIG_CHANGE },
 [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
 [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8, 
IMMEDIATE_POLICY_CHANGE },
+[LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported, 0, 
0 },
+[LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
 };
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
-- 
2.32.0

[PATCH v9 04/45] hw/cxl/device: Introduce a CXL device (8.2.8)

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

A CXL device is a type of CXL component. Conceptually, a CXL device
would be a leaf node in a CXL topology. From an emulation perspective,
CXL devices are the most complex and so the actual implementation is
reserved for discrete commits.

This new device type is specifically catered towards the eventual
implementation of a Type3 CXL.mem device, 8.2.8.5 in the CXL 2.0
specification.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
Reviewed by: Adam Manzanares 
---
 include/hw/cxl/cxl.h|   1 +
 include/hw/cxl/cxl_device.h | 166 
 2 files changed, 167 insertions(+)

diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
index 8c738c7a2b..b9d1ac3fad 100644
--- a/include/hw/cxl/cxl.h
+++ b/include/hw/cxl/cxl.h
@@ -12,5 +12,6 @@
 
 #include "cxl_pci.h"
 #include "cxl_component.h"
+#include "cxl_device.h"
 
 #endif
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
new file mode 100644
index 00..9513aaac77
--- /dev/null
+++ b/include/hw/cxl/cxl_device.h
@@ -0,0 +1,166 @@
+/*
+ * QEMU CXL Devices
+ *
+ * Copyright (c) 2020 Intel
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ */
+
+#ifndef CXL_DEVICE_H
+#define CXL_DEVICE_H
+
+#include "hw/register.h"
+
+/*
+ * The following is how a CXL device's Memory Device registers are laid out.
+ * The only requirement from the spec is that the capabilities array and the
+ * capability headers start at offset 0 and are contiguously packed. The 
headers
+ * themselves provide offsets to the register fields. For this emulation, the
+ * actual registers  * will start at offset 0x80 (m == 0x80). No secondary
+ * mailbox is implemented which means that the offset of the start of the
+ * mailbox payload (n) is given by
+ * n = m + sizeof(mailbox registers) + sizeof(device registers).
+ *
+ *   +-+
+ *   | |
+ *   |Memory Device Registers  |
+ *   | |
+ * n + PAYLOAD_SIZE_MAX  ---
+ *  ^| |
+ *  || |
+ *  || |
+ *  || |
+ *  || |
+ *  || Mailbox Payload |
+ *  || |
+ *  || |
+ *  || |
+ *  n---
+ *  ^|   Mailbox Registers |
+ *  || |
+ *  |---
+ *  || |
+ *  ||Device Registers |
+ *  || |
+ *  m-->
+ *  ^|  Memory Device Capability Header|
+ *  |---
+ *  || Mailbox Capability Header   |
+ *  |---
+ *  || Device Capability Header|
+ *  |---
+ *  || Device Cap Array Register   |
+ *  0+-+
+ *
+ */
+
+#define CXL_DEVICE_CAP_HDR1_OFFSET 0x10 /* Figure 138 */
+#define CXL_DEVICE_CAP_REG_SIZE 0x10 /* 8.2.8.2 */
+#define CXL_DEVICE_CAPS_MAX 4 /* 8.2.8.2.1 + 8.2.8.5 */
+
+#define CXL_DEVICE_STATUS_REGISTERS_OFFSET 0x80 /* Read comment above */
+#define CXL_DEVICE_STATUS_REGISTERS_LENGTH 0x8 /* 8.2.8.3.1 */
+
+#define CXL_MAILBOX_REGISTERS_OFFSET \
+(CXL_DEVICE_STATUS_REGISTERS_OFFSET + CXL_DEVICE_STATUS_REGISTERS_LENGTH)
+#define CXL_MAILBOX_REGISTERS_SIZE 0x20 /* 8.2.8.4, Figure 139 */
+#define CXL_MAILBOX_PAYLOAD_SHIFT 11
+#define CXL_MAILBOX_MAX_PAYLOAD_SIZE (1 << CXL_MAILBOX_PAYLOAD_SHIFT)
+#define CXL_MAILBOX_REGISTERS_LENGTH \
+(CXL_MAILBOX_REGISTERS_SIZE + CXL_MAILBOX_MAX_PAYLOAD_SIZE)
+
+typedef struct cxl_device_state {
+MemoryRegion device_registers;
+
+/* mmio for device capabilities array - 8.2.8.2 */
+MemoryRegion device;
+MemoryRegion caps;
+
+/* mmio for the mailbox registers 8.2.8.4 */
+MemoryRegion mailbox;
+
+/* memory region for persistent memory, HDM */
+uint64_t pmem_size;
+} CXLDeviceState;
+
+/* Initialize the register block for a device */
+void cxl_device_register_block_init(Object *obj,

[PATCH v9 09/45] hw/cxl/device: Timestamp implementation (8.2.9.3)

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

Errata F4 to CXL 2.0 clarified the meaning of the timer as the
sum of the value set with the timestamp set command and the number
of nano seconds since it was last set.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/cxl/cxl-mailbox-utils.c  | 42 +
 include/hw/cxl/cxl_device.h |  6 ++
 2 files changed, 48 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index fb1f53f48e..4584aa31f7 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -44,6 +44,9 @@ enum {
 #define CLEAR_RECORDS   0x1
 #define GET_INTERRUPT_POLICY   0x2
 #define SET_INTERRUPT_POLICY   0x3
+TIMESTAMP   = 0x03,
+#define GET   0x0
+#define SET   0x1
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -106,9 +109,46 @@ DEFINE_MAILBOX_HANDLER_NOP(events_clear_records);
 DEFINE_MAILBOX_HANDLER_ZEROED(events_get_interrupt_policy, 4);
 DEFINE_MAILBOX_HANDLER_NOP(events_set_interrupt_policy);
 
+/* 8.2.9.3.1 */
+static ret_code cmd_timestamp_get(struct cxl_cmd *cmd,
+  CXLDeviceState *cxl_dstate,
+  uint16_t *len)
+{
+uint64_t time, delta;
+uint64_t final_time = 0;
+
+if (cxl_dstate->timestamp.set) {
+/* First find the delta from the last time the host set the time. */
+time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+delta = time - cxl_dstate->timestamp.last_set;
+final_time = cxl_dstate->timestamp.host_set + delta;
+}
+
+/* Then adjust the actual time */
+stq_le_p(cmd->payload, final_time);
+*len = 8;
+
+return CXL_MBOX_SUCCESS;
+}
+
+/* 8.2.9.3.2 */
+static ret_code cmd_timestamp_set(struct cxl_cmd *cmd,
+  CXLDeviceState *cxl_dstate,
+  uint16_t *len)
+{
+cxl_dstate->timestamp.set = true;
+cxl_dstate->timestamp.last_set = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+
+cxl_dstate->timestamp.host_set = le64_to_cpu(*(uint64_t *)cmd->payload);
+
+*len = 0;
+return CXL_MBOX_SUCCESS;
+}
+
 static QemuUUID cel_uuid;
 
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
+#define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
 
 static struct cxl_cmd cxl_cmd_set[256][256] = {
@@ -120,6 +160,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_events_get_interrupt_policy, 0, 0 },
 [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
 cmd_events_set_interrupt_policy, 4, IMMEDIATE_CONFIG_CHANGE },
+[TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
+[TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8, 
IMMEDIATE_POLICY_CHANGE },
 };
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 954205653e..797a22ddb4 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -111,6 +111,12 @@ typedef struct cxl_device_state {
 size_t cel_size;
 };
 
+struct {
+bool set;
+uint64_t last_set;
+uint64_t host_set;
+} timestamp;
+
 /* memory region for persistent memory, HDM */
 uint64_t pmem_size;
 } CXLDeviceState;
-- 
2.32.0

[PATCH v9 07/45] hw/cxl/device: Add memory device utilities

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

Memory devices implement extra capabilities on top of CXL devices. This
adds support for that.

A large part of memory devices is the mailbox/command interface. All of
the mailbox handling is done in the mailbox-utils library. Longer term,
new CXL devices that are being emulated may want to handle commands
differently, and therefore would need a mechanism to opt in/out of the
specific generic handlers. As such, this is considered sufficient for
now, but may need more depth in the future.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/cxl/cxl-device-utils.c   | 38 -
 include/hw/cxl/cxl_device.h | 21 +---
 2 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index f6c3e0f095..687759b301 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -131,6 +131,31 @@ static void mailbox_reg_write(void *opaque, hwaddr offset, 
uint64_t value,
 }
 }
 
+static uint64_t mdev_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+uint64_t retval = 0;
+
+retval = FIELD_DP64(retval, CXL_MEM_DEV_STS, MEDIA_STATUS, 1);
+retval = FIELD_DP64(retval, CXL_MEM_DEV_STS, MBOX_READY, 1);
+
+return retval;
+}
+
+static const MemoryRegionOps mdev_ops = {
+.read = mdev_reg_read,
+.write = NULL, /* memory device register is read only */
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 8,
+.max_access_size = 8,
+},
+};
+
 static const MemoryRegionOps mailbox_ops = {
 .read = mailbox_reg_read,
 .write = mailbox_reg_write,
@@ -188,6 +213,9 @@ void cxl_device_register_block_init(Object *obj, 
CXLDeviceState *cxl_dstate)
   "device-status", CXL_DEVICE_STATUS_REGISTERS_LENGTH);
 memory_region_init_io(_dstate->mailbox, obj, _ops, cxl_dstate,
   "mailbox", CXL_MAILBOX_REGISTERS_LENGTH);
+memory_region_init_io(_dstate->memory_device, obj, _ops,
+  cxl_dstate, "memory device caps",
+  CXL_MEMORY_DEVICE_REGISTERS_LENGTH);
 
 memory_region_add_subregion(_dstate->device_registers, 0,
 _dstate->caps);
@@ -197,6 +225,9 @@ void cxl_device_register_block_init(Object *obj, 
CXLDeviceState *cxl_dstate)
 memory_region_add_subregion(_dstate->device_registers,
 CXL_MAILBOX_REGISTERS_OFFSET,
 _dstate->mailbox);
+memory_region_add_subregion(_dstate->device_registers,
+CXL_MEMORY_DEVICE_REGISTERS_OFFSET,
+_dstate->memory_device);
 }
 
 static void device_reg_init_common(CXLDeviceState *cxl_dstate) { }
@@ -209,10 +240,12 @@ static void mailbox_reg_init_common(CXLDeviceState 
*cxl_dstate)
 cxl_dstate->payload_size = CXL_MAILBOX_MAX_PAYLOAD_SIZE;
 }
 
+static void memdev_reg_init_common(CXLDeviceState *cxl_dstate) { }
+
 void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
 {
 uint64_t *cap_hdrs = cxl_dstate->caps_reg_state64;
-const int cap_count = 2;
+const int cap_count = 3;
 
 /* CXL Device Capabilities Array Register */
 ARRAY_FIELD_DP64(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_ID, 0);
@@ -225,5 +258,8 @@ void cxl_device_register_init_common(CXLDeviceState 
*cxl_dstate)
 cxl_device_cap_init(cxl_dstate, MAILBOX, 2);
 mailbox_reg_init_common(cxl_dstate);
 
+cxl_device_cap_init(cxl_dstate, MEMORY_DEVICE, 0x4000);
+memdev_reg_init_common(cxl_dstate);
+
 assert(cxl_initialize_mailbox(cxl_dstate) == 0);
 }
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 35489f635a..954205653e 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -72,15 +72,20 @@
 #define CXL_MAILBOX_REGISTERS_LENGTH \
 (CXL_MAILBOX_REGISTERS_SIZE + CXL_MAILBOX_MAX_PAYLOAD_SIZE)
 
-#define CXL_MMIO_SIZE   \
-(CXL_DEVICE_CAP_REG_SIZE + CXL_DEVICE_STATUS_REGISTERS_LENGTH + \
- CXL_MAILBOX_REGISTERS_LENGTH)
+#define CXL_MEMORY_DEVICE_REGISTERS_OFFSET \
+(CXL_MAILBOX_REGISTERS_OFFSET + CXL_MAILBOX_REGISTERS_LENGTH)
+#define CXL_MEMORY_DEVICE_REGISTERS_LENGTH 0x8
+
+#define CXL_MMIO_SIZE   \
+(CXL_DEVICE_CAP_REG_SIZE + CXL_DEVICE_STATUS_REGISTERS_LENGTH + \
+ CXL_MAILBOX_REGISTERS_LENGTH + CXL_MEMORY_DEVICE_REGISTERS_LENGTH)
 
 typedef struct cxl_device_state {
 MemoryRegion device_registers;
 
 /* mmio for device capabilities array - 8.2.8.2 */
 MemoryRegion device;
+MemoryRegion memory_device;
 struct {
 MemoryRegion caps;
 union {
@@ -153,6 +158,9 @@ REG64(CXL_DEV_CAP_ARRAY, 0) /*

[PATCH v9 05/45] hw/cxl/device: Implement the CAP array (8.2.8.1-2)

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

This implements all device MMIO up to the first capability. That
includes the CXL Device Capabilities Array Register, as well as all of
the CXL Device Capability Header Registers. The latter are filled in as
they are implemented in the following patches.

Endianness and alignment are managed by softmmu memory core.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/cxl/cxl-device-utils.c   | 109 
 hw/cxl/meson.build  |   1 +
 include/hw/cxl/cxl_device.h |  31 +-
 3 files changed, 140 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
new file mode 100644
index 00..241f9f82e3
--- /dev/null
+++ b/hw/cxl/cxl-device-utils.c
@@ -0,0 +1,109 @@
+/*
+ * CXL Utility library for devices
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/cxl/cxl.h"
+
+/*
+ * Device registers have no restrictions per the spec, and so fall back to the
+ * default memory mapped register rules in 8.2:
+ *   Software shall use CXL.io Memory Read and Write to access memory mapped
+ *   register defined in this section. Unless otherwise specified, software
+ *   shall restrict the accesses width based on the following:
+ *   • A 32 bit register shall be accessed as a 1 Byte, 2 Bytes or 4 Bytes
+ * quantity.
+ *   • A 64 bit register shall be accessed as a 1 Byte, 2 Bytes, 4 Bytes or 8
+ * Bytes
+ *   • The address shall be a multiple of the access width, e.g. when
+ * accessing a register as a 4 Byte quantity, the address shall be
+ * multiple of 4.
+ *   • The accesses shall map to contiguous bytes.If these rules are not
+ * followed, the behavior is undefined
+ */
+
+static uint64_t caps_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+CXLDeviceState *cxl_dstate = opaque;
+
+if (size == 4) {
+return cxl_dstate->caps_reg_state32[offset / 
sizeof(*cxl_dstate->caps_reg_state32)];
+} else {
+return cxl_dstate->caps_reg_state64[offset / 
sizeof(*cxl_dstate->caps_reg_state64)];
+}
+}
+
+static uint64_t dev_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+return 0;
+}
+
+static const MemoryRegionOps dev_ops = {
+.read = dev_reg_read,
+.write = NULL, /* status register is read only */
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 1,
+.max_access_size = 8,
+},
+};
+
+static const MemoryRegionOps caps_ops = {
+.read = caps_reg_read,
+.write = NULL, /* caps registers are read only */
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 4,
+.max_access_size = 8,
+},
+};
+
+void cxl_device_register_block_init(Object *obj, CXLDeviceState *cxl_dstate)
+{
+/* This will be a BAR, so needs to be rounded up to pow2 for PCI spec */
+memory_region_init(_dstate->device_registers, obj, "device-registers",
+   pow2ceil(CXL_MMIO_SIZE));
+
+memory_region_init_io(_dstate->caps, obj, _ops, cxl_dstate,
+  "cap-array", CXL_CAPS_SIZE);
+memory_region_init_io(_dstate->device, obj, _ops, cxl_dstate,
+  "device-status", CXL_DEVICE_STATUS_REGISTERS_LENGTH);
+
+memory_region_add_subregion(_dstate->device_registers, 0,
+_dstate->caps);
+memory_region_add_subregion(_dstate->device_registers,
+CXL_DEVICE_STATUS_REGISTERS_OFFSET,
+_dstate->device);
+}
+
+static void device_reg_init_common(CXLDeviceState *cxl_dstate) { }
+
+void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
+{
+uint64_t *cap_hdrs = cxl_dstate->caps_reg_state64;
+const int cap_count = 1;
+
+/* CXL Device Capabilities Array Register */
+ARRAY_FIELD_DP64(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_ID, 0);
+ARRAY_FIELD_DP64(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_VERSION, 1);
+ARRAY_FIELD_DP64(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_COUNT, cap_count);
+
+cxl_device_cap_init(cxl_dstate, DEVICE_STATUS, 1);
+device_reg_init_common(cxl_dstate);
+}
diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
index 3231b5de1e..dd7c6f8e5a 100644
--- a/hw/cxl/meson.build
+++ b/hw/cxl/meson.build
@@ -1,4 +1,5 @@
 softmmu_ss.add(when: 'CONFIG_CXL',
if_true: files(
'cxl-component-utils.c',
+   'cxl-device-utils.c',
))
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index

[PATCH v9 03/45] MAINTAINERS: Add entry for Compute Express Link Emulation

2022-04-04 Thread Jonathan Cameron via

From: Jonathan Cameron 

The CXL emulation will be jointly maintained by Ben Widawsky
and Jonathan Cameron.  Broken out as a separate patch
to improve visibility.

Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index cc364afef7..1b09419977 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2544,6 +2544,13 @@ F: qapi/block*.json
 F: qapi/transaction.json
 T: git https://repo.or.cz/qemu/armbru.git block-next
 
+Compute Express Link
+M: Ben Widawsky 
+M: Jonathan Cameron 
+S: Supported
+F: hw/cxl/
+F: include/hw/cxl/
+
 Dirty Bitmaps
 M: Eric Blake 
 M: Vladimir Sementsov-Ogievskiy 
-- 
2.32.0

[PATCH v9 06/45] hw/cxl/device: Implement basic mailbox (8.2.8.4)

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

This is the beginning of implementing mailbox support for CXL 2.0
devices. The implementation recognizes when the doorbell is rung,
handles the command/payload, clears the doorbell while returning error
codes and data.

Generally the mailbox mechanism is designed to permit communication
between the host OS and the firmware running on the device. For our
purposes, we emulate both the firmware, implemented primarily in
cxl-mailbox-utils.c, and the hardware.

No commands are implemented yet.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/cxl/cxl-device-utils.c   | 122 ++-
 hw/cxl/cxl-mailbox-utils.c  | 164 
 hw/cxl/meson.build  |   1 +
 include/hw/cxl/cxl.h|   3 +
 include/hw/cxl/cxl_device.h |  19 -
 5 files changed, 307 insertions(+), 2 deletions(-)

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index 241f9f82e3..f6c3e0f095 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -44,6 +44,108 @@ static uint64_t dev_reg_read(void *opaque, hwaddr offset, 
unsigned size)
 return 0;
 }
 
+static uint64_t mailbox_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+CXLDeviceState *cxl_dstate = opaque;
+
+switch (size) {
+case 1:
+return cxl_dstate->mbox_reg_state[offset];
+case 2:
+return cxl_dstate->mbox_reg_state16[offset / size];
+case 4:
+return cxl_dstate->mbox_reg_state32[offset / size];
+case 8:
+return cxl_dstate->mbox_reg_state64[offset / size];
+default:
+g_assert_not_reached();
+}
+}
+
+static void mailbox_mem_writel(uint32_t *reg_state, hwaddr offset,
+   uint64_t value)
+{
+switch (offset) {
+case A_CXL_DEV_MAILBOX_CTRL:
+/* fallthrough */
+case A_CXL_DEV_MAILBOX_CAP:
+/* RO register */
+break;
+default:
+qemu_log_mask(LOG_UNIMP,
+  "%s Unexpected 32-bit access to 0x%" PRIx64 " (WI)\n",
+  __func__, offset);
+return;
+}
+
+reg_state[offset / sizeof(*reg_state)] = value;
+}
+
+static void mailbox_mem_writeq(uint64_t *reg_state, hwaddr offset,
+   uint64_t value)
+{
+switch (offset) {
+case A_CXL_DEV_MAILBOX_CMD:
+break;
+case A_CXL_DEV_BG_CMD_STS:
+/* BG not supported */
+/* fallthrough */
+case A_CXL_DEV_MAILBOX_STS:
+/* Read only register, will get updated by the state machine */
+return;
+default:
+qemu_log_mask(LOG_UNIMP,
+  "%s Unexpected 64-bit access to 0x%" PRIx64 " (WI)\n",
+  __func__, offset);
+return;
+}
+
+
+reg_state[offset / sizeof(*reg_state)] = value;
+}
+
+static void mailbox_reg_write(void *opaque, hwaddr offset, uint64_t value,
+  unsigned size)
+{
+CXLDeviceState *cxl_dstate = opaque;
+
+if (offset >= A_CXL_DEV_CMD_PAYLOAD) {
+memcpy(cxl_dstate->mbox_reg_state + offset, , size);
+return;
+}
+
+switch (size) {
+case 4:
+mailbox_mem_writel(cxl_dstate->mbox_reg_state32, offset, value);
+break;
+case 8:
+mailbox_mem_writeq(cxl_dstate->mbox_reg_state64, offset, value);
+break;
+default:
+g_assert_not_reached();
+}
+
+if (ARRAY_FIELD_EX32(cxl_dstate->mbox_reg_state32, CXL_DEV_MAILBOX_CTRL,
+ DOORBELL)) {
+cxl_process_mailbox(cxl_dstate);
+}
+}
+
+static const MemoryRegionOps mailbox_ops = {
+.read = mailbox_reg_read,
+.write = mailbox_reg_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 1,
+.max_access_size = 8,
+},
+};
+
 static const MemoryRegionOps dev_ops = {
 .read = dev_reg_read,
 .write = NULL, /* status register is read only */
@@ -84,20 +186,33 @@ void cxl_device_register_block_init(Object *obj, 
CXLDeviceState *cxl_dstate)
   "cap-array", CXL_CAPS_SIZE);
 memory_region_init_io(_dstate->device, obj, _ops, cxl_dstate,
   "device-status", CXL_DEVICE_STATUS_REGISTERS_LENGTH);
+memory_region_init_io(_dstate->mailbox, obj, _ops, cxl_dstate,
+  "mailbox", CXL_MAILBOX_REGISTERS_LENGTH);
 
 memory_region_add_subregion(_dstate->device_registers, 0,
 _dstate->caps);
 memory_region_add_subregion(_dstate->device_registers,
 CXL_DEVICE_STATUS_REGISTERS_OFFSET,
 _dstate->device);
+memory_region_add_subregion(_dstate->device_registers,
+CXL_MAILBOX_REGISTERS_OFFSET,
+

[PATCH v9 01/45] hw/pci/cxl: Add a CXL component type (interface)

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

A CXL component is a hardware entity that implements CXL component
registers from the CXL 2.0 spec (8.2.3). Currently these represent 3
general types.
1. Host Bridge
2. Ports (root, upstream, downstream)
3. Devices (memory, other)

A CXL component can be conceptually thought of as a PCIe device with
extra functionality when enumerated and enabled. For this reason, CXL
does here, and will continue to add on to existing PCI code paths.

Host bridges will typically need to be handled specially and so they can
implement this newly introduced interface or not. All other components
should implement this interface. Implementing this interface allows the
core PCI code to treat these devices as special where appropriate.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
Reviewed by: Adam Manzanares 
---
 hw/pci/pci.c | 10 ++
 include/hw/pci/pci.h |  8 
 2 files changed, 18 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index dae9119bfe..a7f5c43587 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -201,6 +201,11 @@ static const TypeInfo pci_bus_info = {
 .class_init = pci_bus_class_init,
 };
 
+static const TypeInfo cxl_interface_info = {
+.name  = INTERFACE_CXL_DEVICE,
+.parent= TYPE_INTERFACE,
+};
+
 static const TypeInfo pcie_interface_info = {
 .name  = INTERFACE_PCIE_DEVICE,
 .parent= TYPE_INTERFACE,
@@ -2182,6 +2187,10 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
 }
 
+if (object_class_dynamic_cast(klass, INTERFACE_CXL_DEVICE)) {
+pci_dev->cap_present |= QEMU_PCIE_CAP_CXL;
+}
+
 pci_dev = do_pci_register_device(pci_dev,
  object_get_typename(OBJECT(qdev)),
  pci_dev->devfn, errp);
@@ -2938,6 +2947,7 @@ static void pci_register_types(void)
 type_register_static(_bus_info);
 type_register_static(_bus_info);
 type_register_static(_pci_interface_info);
+type_register_static(_interface_info);
 type_register_static(_interface_info);
 type_register_static(_device_type_info);
 }
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 3a32b8dd40..98f0d1b844 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -194,6 +194,8 @@ enum {
 QEMU_PCIE_LNKSTA_DLLLA = (1 << QEMU_PCIE_LNKSTA_DLLLA_BITNR),
 #define QEMU_PCIE_EXTCAP_INIT_BITNR 9
 QEMU_PCIE_EXTCAP_INIT = (1 << QEMU_PCIE_EXTCAP_INIT_BITNR),
+#define QEMU_PCIE_CXL_BITNR 10
+QEMU_PCIE_CAP_CXL = (1 << QEMU_PCIE_CXL_BITNR),
 };
 
 #define TYPE_PCI_DEVICE "pci-device"
@@ -201,6 +203,12 @@ typedef struct PCIDeviceClass PCIDeviceClass;
 DECLARE_OBJ_CHECKERS(PCIDevice, PCIDeviceClass,
  PCI_DEVICE, TYPE_PCI_DEVICE)
 
+/*
+ * Implemented by devices that can be plugged on CXL buses. In the spec, this 
is
+ * actually a "CXL Component, but we name it device to match the PCI naming.
+ */
+#define INTERFACE_CXL_DEVICE "cxl-device"
+
 /* Implemented by devices that can be plugged on PCI Express buses */
 #define INTERFACE_PCIE_DEVICE "pci-express-device"
 
-- 
2.32.0

Re: [PATCH v8 04/46] hw/cxl/device: Introduce a CXL device (8.2.8)

2022-04-04 Thread Adam Manzanares

On Fri, Apr 01, 2022 at 02:30:34PM +0100, Jonathan Cameron wrote:
> On Thu, 31 Mar 2022 22:13:20 +
> Adam Manzanares  wrote:
> 
> > On Wed, Mar 30, 2022 at 06:48:48PM +0100, Jonathan Cameron wrote:
> > > On Tue, 29 Mar 2022 18:13:59 +
> > > Adam Manzanares  wrote:
> > >   
> > > > On Fri, Mar 18, 2022 at 03:05:53PM +, Jonathan Cameron wrote:  
> > > > > From: Ben Widawsky 
> > > > > 
> > > > > A CXL device is a type of CXL component. Conceptually, a CXL device
> > > > > would be a leaf node in a CXL topology. From an emulation perspective,
> > > > > CXL devices are the most complex and so the actual implementation is
> > > > > reserved for discrete commits.
> > > > > 
> > > > > This new device type is specifically catered towards the eventual
> > > > > implementation of a Type3 CXL.mem device, 8.2.8.5 in the CXL 2.0
> > > > > specification.
> > > > > 
> > > > > Signed-off-by: Ben Widawsky 
> > > > > Signed-off-by: Jonathan Cameron 
> > > > > Reviewed-by: Alex Bennée   
> > > 
> > > ...
> > >   
> > > > > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > > > > new file mode 100644
> > > > > index 00..b2416e45bf
> > > > > --- /dev/null
> > > > > +++ b/include/hw/cxl/cxl_device.h
> > > > > @@ -0,0 +1,165 @@
> > > > > +/*
> > > > > + * QEMU CXL Devices
> > > > > + *
> > > > > + * Copyright (c) 2020 Intel
> > > > > + *
> > > > > + * This work is licensed under the terms of the GNU GPL, version 2. 
> > > > > See the
> > > > > + * COPYING file in the top-level directory.
> > > > > + */
> > > > > +
> > > > > +#ifndef CXL_DEVICE_H
> > > > > +#define CXL_DEVICE_H
> > > > > +
> > > > > +#include "hw/register.h"
> > > > > +
> > > > > +/*
> > > > > + * The following is how a CXL device's MMIO space is laid out. The 
> > > > > only
> > > > > + * requirement from the spec is that the capabilities array and the 
> > > > > capability
> > > > > + * headers start at offset 0 and are contiguously packed. The 
> > > > > headers themselves
> > > > > + * provide offsets to the register fields. For this emulation, 
> > > > > registers will
> > > > > + * start at offset 0x80 (m == 0x80). No secondary mailbox is 
> > > > > implemented which
> > > > > + * means that n = m + sizeof(mailbox registers) + sizeof(device 
> > > > > registers).
> > > > 
> > > > What is n here, the start offset of the mailbox registers, this 
> > > > question is 
> > > > based on the figure below?  
> > > 
> > > I'll expand on this to say
> > > 
> > > means that the offset of the start of the mailbox payload (n) is given by
> > > n = m + sizeof
> > > 
> > > Which means the diagram below is wrong as should align with top
> > > of mailbox registers.
> > >   
> > > >   
> > > > > + *
> > > > > + * This is roughly described in 8.2.8 Figure 138 of the CXL 2.0 spec 
> > > > >  
> > > I'm going drop this comment as that figure appears unrelated to me.
> > >   
> > > > > + *
> > > > > + *   +-+
> > > > > + *   | |
> > > > > + *   |Memory Device Registers  |
> > > > > + *   | |
> > > > > + * n + PAYLOAD_SIZE_MAX  ---
> > > > > + *  ^| |
> > > > > + *  || |
> > > > > + *  || |
> > > > > + *  || |
> > > > > + *  || |
> > > > > + *  || Mailbox Payload |
> > > > > + *  || |
> > > > > + *  || |
> > > > > + *  || |
> > > > > + *  |---
> > > > > + *  ||   Mailbox Registers |
> > > > > + *  || |
> > > > > + *  n---
> > > > > + *  ^| |
> > > > > + *  ||Device Registers |
> > > > > + *  || |
> > > > > + *  m-->
> > > > > + *  ^|  Memory Device Capability Header|
> > > > > + *  |---
> > > > > + *  || Mailbox Capability Header   |
> > > > > + *  |-- 
> > > > > + *  || Device Capability Header|
> > > > > + *  |---
> > > > > + *  ||

[PATCH v9 02/45] hw/cxl/component: Introduce CXL components (8.1.x, 8.2.5)

2022-04-04 Thread Jonathan Cameron via

From: Ben Widawsky 

A CXL 2.0 component is any entity in the CXL topology. All components
have a analogous function in PCIe. Except for the CXL host bridge, all
have a PCIe config space that is accessible via the common PCIe
mechanisms. CXL components are enumerated via DVSEC fields in the
extended PCIe header space. CXL components will minimally implement some
subset of CXL.mem and CXL.cache registers defined in 8.2.5 of the CXL
2.0 specification. Two headers and a utility library are introduced to
support the minimum functionality needed to enumerate components.

The cxl_pci header manages bits associated with PCI, specifically the
DVSEC and related fields. The cxl_component.h variant has data
structures and APIs that are useful for drivers implementing any of the
CXL 2.0 components. The library takes care of making use of the DVSEC
bits and the CXL.[mem|cache] registers. Per spec, the registers are
little endian.

None of the mechanisms required to enumerate a CXL capable hostbridge
are introduced at this point.

Note that the CXL.mem and CXL.cache registers used are always 4B wide.
It's possible in the future that this constraint will not hold.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
Reviewed by: Adam Manzanares 
---
 hw/Kconfig |   1 +
 hw/cxl/Kconfig |   3 +
 hw/cxl/cxl-component-utils.c   | 225 +
 hw/cxl/meson.build |   4 +
 hw/meson.build |   1 +
 include/hw/cxl/cxl.h   |  16 +++
 include/hw/cxl/cxl_component.h | 197 +
 include/hw/cxl/cxl_pci.h   | 146 +
 8 files changed, 593 insertions(+)

diff --git a/hw/Kconfig b/hw/Kconfig
index ad20cce0a9..50e0952889 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -6,6 +6,7 @@ source audio/Kconfig
 source block/Kconfig
 source char/Kconfig
 source core/Kconfig
+source cxl/Kconfig
 source display/Kconfig
 source dma/Kconfig
 source gpio/Kconfig
diff --git a/hw/cxl/Kconfig b/hw/cxl/Kconfig
new file mode 100644
index 00..8e67519b16
--- /dev/null
+++ b/hw/cxl/Kconfig
@@ -0,0 +1,3 @@
+config CXL
+bool
+default y if PCI_EXPRESS
diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
new file mode 100644
index 00..22e52cef17
--- /dev/null
+++ b/hw/cxl/cxl-component-utils.c
@@ -0,0 +1,225 @@
+/*
+ * CXL Utility library for components
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/pci/pci.h"
+#include "hw/cxl/cxl.h"
+
+static uint64_t cxl_cache_mem_read_reg(void *opaque, hwaddr offset,
+   unsigned size)
+{
+CXLComponentState *cxl_cstate = opaque;
+ComponentRegisters *cregs = _cstate->crb;
+
+if (size == 8) {
+qemu_log_mask(LOG_UNIMP,
+  "CXL 8 byte cache mem registers not implemented\n");
+return 0;
+}
+
+if (cregs->special_ops && cregs->special_ops->read) {
+return cregs->special_ops->read(cxl_cstate, offset, size);
+} else {
+return cregs->cache_mem_registers[offset / 
sizeof(*cregs->cache_mem_registers)];
+}
+}
+
+static void cxl_cache_mem_write_reg(void *opaque, hwaddr offset, uint64_t 
value,
+unsigned size)
+{
+CXLComponentState *cxl_cstate = opaque;
+ComponentRegisters *cregs = _cstate->crb;
+
+if (size == 8) {
+qemu_log_mask(LOG_UNIMP,
+  "CXL 8 byte cache mem registers not implemented\n");
+return;
+}
+if (cregs->special_ops && cregs->special_ops->write) {
+cregs->special_ops->write(cxl_cstate, offset, value, size);
+} else {
+cregs->cache_mem_registers[offset / 
sizeof(*cregs->cache_mem_registers)] = value;
+}
+}
+
+/*
+ * 8.2.3
+ *   The access restrictions specified in Section 8.2.2 also apply to CXL 2.0
+ *   Component Registers.
+ *
+ * 8.2.2
+ *   • A 32 bit register shall be accessed as a 4 Bytes quantity. Partial
+ *   reads are not permitted.
+ *   • A 64 bit register shall be accessed as a 8 Bytes quantity. Partial
+ *   reads are not permitted.
+ *
+ * As of the spec defined today, only 4 byte registers exist.
+ */
+static const MemoryRegionOps cache_mem_ops = {
+.read = cxl_cache_mem_read_reg,
+.write = cxl_cache_mem_write_reg,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 4,
+.max_access_size = 8,
+},
+};
+
+void cxl_component_register_block_init(Object *obj,
+   CXLComponentState *cxl_cstate,
+   const char *type)
+{
+ComponentRegisters *cregs =

[PATCH v9 00/45] CXl 2.0 emulation Support

2022-04-04 Thread Jonathan Cameron via

CI passing both with the full series and at appropriate points
for a partial series merge if desired (at end of each section
tests are introduced)
https://gitlab.com/jic23/qemu/-/pipelines/508396913
Possible partial sets:
1-15 (end with the test of the pxb-cxl host bridge)
16-22 (end with the test for root port and type3 device)
23-39 (end with tests on x86 pc for CFMWS including BIOS table updates)
40-41 (arm64 virt support + simple test case)
42 (documentation - we could pull this forwards to before the arm support)
43-45 (switch support)

Note the gitlab branch also has additional patches on top of these
that will form the part of future postings (PCIe DOE, CDAT,
serial number support and improved fidelity of emulation)
Several people have asked about contributing additional features.
As those come in I'll apply them on top of this series and handle
rebases etc as necessary whilst we seek to get this first set
of patches upstream.

Changes since v8:
 Thanks to Adam Manzanares, Alison Schofield and Mark Cave-Ayland
 for review.
For reference v8 thread at:
https://lore.kernel.org/qemu-devel/20220318150635.24600-1-jonathan.came...@huawei.com/
 
 - Fix crash when no hostmem region provided (from CI)
 - Fix a mid series build bug (from chasing that CI issue)
 - (various patches) Switch the various struct cxl_dvsec_* to typdefs
   CXLDVSECDeviceGPF etc. This reduces line lengths in a patch to add
   write masks for PCI config space that will be part of a follow up
   to this series.
 - (various) Switch away from old style initializers and associated
   renames (Mark) 
 - (patch 2, various) Use sizeof() or local size variable rather than
   hard coding division by 4 or 8 when indexing into register arrays (Adam)
 - (patch 2) Add comment for strange write mask CXL_RAS_UNC_ERR_SEVERITY (Adam)
 - (patch 2) Fix wrong mask for COR_ERR
 - (patch 2) Add a comment explaining less than obvious fact we can use
   a contrived order of capabilities to allow a single number to represent
   which ones should be enabled. (Adam)
 - (patch 2) Wrong version number for RAS cap header (Adam)
 - (patch 2) Wrong space left for HDM decoders (Adam)
 - (patch 2) Fix field of cxl_dvsec_port_extensions to be
   alt_prefetch_limit_high (Adam)
 - (patch 2) Add CXLDVSECDeviceGPF (noticed as part of follow up series prep)
 - (patch 3) Improve docs around the large ASCI art figure (Adam)
 - (patch 3) Rename CXL_DEVICE_REGISTERS_* to CXL_DEVICE_STATUS_REGISTERS *
   to match the specification (Adam)
 - (patch 3) Extra references to the specification (Adam)
 - (patch 3) Rename a few fields in CXL_DEV_BG_CMD_STS to more closely match
   the specification (Adam)
 - (patch 17) Drop stale ifdef (Mark)
 - (patch 19) Fix wrong value of part_info->nex_pmem so it now matches
   what the spec requires (Alison)
 - (patch 27) Fix docs to not mention OptsVisitor, to be more detailed
   on what sizes are accepted, provid more detail on what id means and
   update version number (Mark)
 - (patch 27) Use loc_save()/loc_pop() to improve printed error (Mark)
 - (patch 27) Rename config function (Mark)
 - (patch 32) Fix address_space cleanup and move other parts of
   instance_finalize() to pc->exit() to balance what is in pc->realize()
   (Mark)
 - (various) Minor typos and formatting cleanup observed whilst preparing
   series.
Some discussion occurred on allow for volatile memory support rather than
just PMEM. That is postponed to a future patch set. Also some discussion
on future work coordination.

Mark's suggestion of using PCI BDF for naming unfortunately doesn't
work as they are not constant (or indeed enumerated at all in some cases)

I'm resisting the urge to have this series continue to grow with
additional features on the basis it is already huge and what we have
here is useful + functional.

Updated background info:

Looking in particular for:
* Review of the PCI interactions
* x86 and ARM machine interactions (particularly the memory maps)
* Review of the interleaving approach - is the basic idea
  acceptable?
* Review of the command line interface.
* CXL related review welcome but much of that got reviewed
  in earlier versions and hasn't changed substantially.

TODOs:

* Volatile memory devices (easy but it's more code so left for now).
* Hotplug?  May not need much but it's not tested yet!
* More tests and tighter verification that values written to hardware
  are actually valid - stuff that real hardware would check.
* Testing, testing and more testing.  I have been running a basic
  set of ARM and x86 tests on this, but there is always room for
  more tests and greater automation.
* CFMWS flags as requested by Ben.
* Parititioning support - ability to change the balance of volatile
  and non volatile memory on demand.
* Trace points as suggested my Mark to help with debugging memory
  interleaving setup.

Why do we want QEMU emulation of CXL?

As Ben stated in V3, QEMU support has been critical to getting OS
software written given lack of

Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-04-04 Thread Quentin Perret

On Friday 01 Apr 2022 at 12:56:50 (-0700), Andy Lutomirski wrote:
> On Fri, Apr 1, 2022, at 7:59 AM, Quentin Perret wrote:
> > On Thursday 31 Mar 2022 at 09:04:56 (-0700), Andy Lutomirski wrote:
> 
> 
> > To answer your original question about memory 'conversion', the key
> > thing is that the pKVM hypervisor controls the stage-2 page-tables for
> > everyone in the system, all guests as well as the host. As such, a page
> > 'conversion' is nothing more than a permission change in the relevant
> > page-tables.
> >
> 
> So I can see two different ways to approach this.
> 
> One is that you split the whole address space in half and, just like SEV and 
> TDX, allocate one bit to indicate the shared/private status of a page.  This 
> makes it work a lot like SEV and TDX.
>
> The other is to have shared and private pages be distinguished only by their 
> hypercall history and the (protected) page tables.  This saves some address 
> space and some page table allocations, but it opens some cans of worms too.  
> In particular, the guest and the hypervisor need to coordinate, in a way that 
> the guest can trust, to ensure that the guest's idea of which pages are 
> private match the host's.  This model seems a bit harder to support nicely 
> with the private memory fd model, but not necessarily impossible.

Right. Perhaps one thing I should clarify as well: pKVM (as opposed to
TDX) has only _one_ page-table per guest, and it is controllex by the
hypervisor only. So the hypervisor needs to be involved for both shared
and private mappings. As such, shared pages have relatively similar
constraints when it comes to host mm stuff --  we can't migrate shared
pages or swap them out without getting the hypervisor involved.

> Also, what are you trying to accomplish by having the host userspace mmap 
> private pages?

What I would really like to have is non-destructive in-place conversions
of pages. mmap-ing the pages that have been shared back felt like a good
fit for the private=>shared conversion, but in fact I'm not all that
opinionated about the API as long as the behaviour and the performance
are there. Happy to look into alternatives.

FWIW, there are a couple of reasons why I'd like to have in-place
conversions:

 - one goal of pKVM is to migrate some things away from the Arm
   Trustzone environment (e.g. DRM and the likes) and into protected VMs
   instead. This will give Linux a fighting chance to defend itself
   against these things -- they currently have access to _all_ memory.
   And transitioning pages between Linux and Trustzone (donations and
   shares) is fast and non-destructive, so we really do not want pKVM to
   regress by requiring the hypervisor to memcpy things;

 - it can be very useful for protected VMs to do shared=>private
   conversions. Think of a VM receiving some data from the host in a
   shared buffer, and then it wants to operate on that buffer without
   risking to leak confidential informations in a transient state. In
   that case the most logical thing to do is to convert the buffer back
   to private, do whatever needs to be done on that buffer (decrypting a
   frame, ...), and then share it back with the host to consume it;

 - similar to the previous point, a protected VM might want to
   temporarily turn a buffer private to avoid ToCToU issues;

 - once we're able to do device assignment to protected VMs, this might
   allow DMA-ing to a private buffer, and make it shared later w/o
   bouncing.

And there is probably more.

IIUC, the private fd proposal as it stands requires shared and private
pages to come from entirely distinct places. So it's not entirely clear
to me how any of the above could be supported without having the
hypervisor memcpy the data during conversions, which I really don't want
to do for performance reasons.

> Is the idea that multiple guest could share the same page until such time as 
> one of them tries to write to it?

That would certainly be possible to implement in the pKVM
environment with the right tracking, so I think it is worth considering
as a future goal.

Thanks,
Quentin

1 2 >

1 - 100 of 146 matches

Mail list logo