On Wed, 29 Apr 2026 14:48:35 +0100
Joshua Lant <[email protected]> wrote:

> Signed-off-by: Joshua Lant <[email protected]>
Hi Joshua,

Sorry it's taken me a while to get to this!  I blame to much activity
on other open source projects! :)

I've mused in the past on how to do the command lines for these.
So some thoughts are based on that - feel free to argue why we
the structure you have here works better.

When I get through the series I may well change my mind on some
of what follows ;)


> ---
>  docs/system/devices/cxl.rst | 90 ++++++++++++++++++++++++++++++++++---
>  1 file changed, 85 insertions(+), 5 deletions(-)
> 
> diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
> index 32b1b5d773..9e8452e576 100644
> --- a/docs/system/devices/cxl.rst
> +++ b/docs/system/devices/cxl.rst
> @@ -119,11 +119,11 @@ and associated component register access via PCI bars.
>  CXL Switch
>  ~~~~~~~~~~
>  Here we consider a simple CXL switch with only a single
> -virtual hierarchy. Whilst more complex devices exist, their
> -visibility to a particular host is generally the same as for
> -a simple switch design. Hosts often have no awareness
> -of complex rerouting and device pooling, they simply see
> -devices being hot added or hot removed.
> +virtual hierarchy. Whilst more complex devices exist (see VCS
> +Switching below), their visibility to a particular host is
> +generally the same as for a simple switch design. Hosts often
> +have no awareness of complex rerouting and device pooling,
> +they simply see devices being hot added or hot removed.
>  
>  A CXL switch has a similar architecture to those in PCIe,
>  with a single upstream port, internal PCI bus and multiple
> @@ -467,6 +467,86 @@ Example configuration:
>  Guest OS communication with the MCTP CCI can then be established using 
> standard
>  MCTP configuration tools.
>  
> +CXL Multi-VCS Switching
> +-----------------------
> +
> +The cxl-vcs-switch object allows for a Fabric Manager to dynamically 
> reconfigure
> +the switching within a multi-upstream port CXL/PCIe topology, This moves 
> beyond
> +the static switching configuration described above. The use of vcs=X on an
> +endpoint device indicates that it should be hidden from guests at boot. 

That bit seems rather unintuitive.  EPs shouldn't really be involved in this
at all. I guess you are using them as a proxy for a physical downstream port?
Interesting idea if a bit non intuitive. I wonder if we can put in an explicit
physical DSP device in. When linked it just proxies the vPPD.

Maybe we can get away without that but it leaves us with no physical port 
hotplug
as we can't connect an empty physical downstream port to a VCS.

> Each
> +upstream port with vcs=X set will conceptually become an upstream PPB. Any
> +downstream port that is connected to an upstream port with vcs=X set will
> +automatically become a vPPB for that VCS. The overall cxl-virtual-switch has 
> a
Neat not to have to set it for the DSPs, but I think we will need them to
grow new functionality so maybe a different device type is good.

> +single CCI mailbox used for config/status of all ports within the switch.

Need to support both MCTP and switch-cci but that should be fine.

> +Setting local-fm=true indicates that this QEMU instance has the CCI mailbox
> +attached. Setting it false will create listeners for commands from a remote
> +QEMU process (yet to be implemented).

Nice but make that the default for now (And drop the parameter).
Absence of a connected CCI might be sufficient though that's a bit ugly
to check.

> +
> +An example of how the topology is described on the CLI is shown below:
> +
> +  -object cxl-vcs-switch,id=vcs0,usp-ppbs=2,dsp-ppbs=4,local-fm=true \
Interesting.  I'd kind of like it to be a device, but it has no presence
on any bus in of itself (arguably it is on a whole load of them). So maybe not.

> +  -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.0,hdm_for_passthrough=true \

Small side note - avoid the passthrough trick. It means a bunch of code
paths aren't exercised and has hidden various OS bugs.

> +  -device cxl-rp,port=0,bus=cxl.0,id=root_port1,chassis=0,slot=1 \
> +  -device pxb-cxl,bus_nr=22,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true \
> +  -device cxl-rp,port=0,bus=cxl.1,id=root_port2,chassis=1,slot=1 \
> +  -device 
> cxl-upstream,port=0,sn=1234,bus=root_port1,id=us0,addr=0.0,multifunction=on,vcs=vcs0,usppb=0
>  \
> +  -device 
> cxl-upstream,port=0,sn=5678,bus=root_port2,id=us1,addr=0.0,multifunction=on,vcs=vcs0,usppb=1
>  \

How can we have two upstream ports in a single vcs?  To me those are separate 
VCSs
where a VCS is normally a tree topology below a given USP.

I think we have a terminology problem.  If I read this right you are using VCS
to mean the whole physical switch?  Been a little while but I don't think
that corresponds at all to it's meaning in the CXL Spec. Your VCS0/1 below
are right.



> +  -device cxl-switch-mailbox-cci,bus=root_port1,addr=0.3,target=vcs0 \
> +  -device usb-cxl-mctp,bus=ehci.0,id=usb0,target=vcs0 \
> +  -device cxl-downstream,port=0,bus=us0,id=dsp0,slot=3 \
> +  -device cxl-downstream,port=1,bus=us0,id=dsp1,slot=4 \
> +  -device cxl-downstream,port=0,bus=us1,id=dsp2,slot=7 \
> +  -device cxl-downstream,port=1,bus=us1,id=dsp3,slot=8 \
Ok. So these only know they are virtual because they are connected to a virtual 
USP.
Might be enough - or we might want to make that more explicit via
a new device type.

> +  -device 
> cxl-type3,persistent-memdev=cxl-mem1,id=cxl-ep1,lsa=cxl-lsa1,sn=99,vcs=vcs0,dsppb=0
>  \
> +  -device 
> cxl-type3,persistent-memdev=cxl-mem2,id=cxl-ep2,lsa=cxl-lsa2,sn=100,vcs=vcs0,dsppb=1
>  \
> +  -device 
> cxl-type3,persistent-memdev=cxl-mem3,id=cxl-ep3,lsa=cxl-lsa3,sn=101,vcs=vcs0,dsppb=2
>  \
> +  -device 
> cxl-type3,persistent-memdev=cxl-mem4,id=cxl-ep4,lsa=cxl-lsa4,sn=102,vcs=vcs0,dsppb=3
>  \
This I mention above. I 'think' you are using the dsppb to instantiate 
something that is pretending
to be a the physical DSP.
I haven't yet read thee series, but gut feeling is that will make the querying 
of link
properties etc rather different from the normal case.

> +  -machine 
> cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=8G,cxl-fmw.1.targets.0=cxl.1,cxl-fmw.1.size=8G
> +
> +Example topology involving VCS switching::
> +
> +               +--------------------+  +--------------------+
> +               |   Host Bridge 0    |  |   Host Bridge 1    |
> +               +----------+---------+  +----------+---------+
> +    +-------+             |                       |
> +    | MCTP  |             |                       |
> +    | USB/  |  +----------+---------+  +----------+---------+
> +    | I2C   |  |    Root Port 0     |  |    Root Port 1     |
> +    +-----+-+  +----------+---------+  +----------+---------+
> +          |               |                       |
> +          |               |                       |
> +   +------|---------------+-----------------------+-----------------------+
> +   |    +-+--------+      |  cxl-vcs-switch (vcs0)|                       |
> +   | +--| CCI MBOX |---*  |                       |                       |
> +   | |  +----------+      |                       |                       |
> +   | |  +-----------------+--------+      +-------+------------------+    |
> +   | +--+                 |   VCS0 |  *---+       |             VCS1 |    |
> +   |    | +---------------+------+ |      | +-----+----------------+ |    |
> +   |    | |                      | |      | |                      | |    |
> +   |    | |        USP 0         | |      | |        USP 1         | |    |
> +   |    | |                      | |      | |                      | |    |
> +   |    | +----+------------+----+ |      | +----+------------+----+ |    |
> +   |    |      |            |      |      |      |            |      |    |
> +   |    | +----+----+  +----+----+ |      | +----+----+  +----+----+ |    |
> +   |    | |  DSP 0  |  |  DSP 1  | |      | |  DSP 2  |  |  DSP 3  | |    |
> +   |    | |(vPPB 0) |  |(vPPB 1) | |      | |(vPPB 0) |  |(vPPB 1) | |    |
> +   |    | |         |  |         | |      | |         |  |         | |    |
> +   |    | +---------+  +---------+ |      | +---------+  +----+----+ |    |
> +   |    +--------------------------+      +-------------------+------+    |
> +   |                                                          |           |
> +   |           +----------------------------------------------+           |
> +   |           |                                                          |
> +   |           |            -                    -            -           |
> +   +-----------|------------|--------------------|------------|-----------+
> +               |            |                    |            |
> +          +---------+  +---------+          +---------+  +---------+
> +          |CXL/PCIe |  |CXL/PCIe |          |CXL/PCIe |  |CXL/PCIe |
> +          |  EP 0   |  |  EP 1   |          |  EP 2   |  |  EP 3   |
> +          | (PPB0)  |  | (PPB1)  |          | (PPB2)  |  | (PPB3)  |
> +          +---------+  +---------+          +---------+  +---------+
> +                 PPB0 Bound to VCS1, vPPB1. Others unbound...
> +
Good to have the diagram as makes it easier to discuss.  

What you have here is a bit of a hack because only some entities created
exist in the command line - the others are spun up implicitly.  I suspect
we really want to make them explicit.  The one thing I never looked into in
the following is how hard it would be to poke a vDSP in front of a physical
DSP and basically proxy stuff through or not.  Some stuff will be programmed
at boot (windows etc for hotplug later) but other stuff will fire in the hotplug
flow on an attach of a physical port. Will need some care and stitching up
memory regions across the boundary.

The command line I'd be looking at for this as a target (feel free to shoot
at it) would be something like (I went with one PXB - but need to test both 
options). 
Note some of this is probably garbage as I haven't checked parameters are right.
 -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.0 \
 -device cxl-rp,bus,cxl.0,id=root_port1...
 -device cxl-rp,bus,cxl.0,id=root_port2..
 -device 
cxl-upstream,port=0,sn=1234,bus=root_port1,id=us0,addr=0.0,multifunction=on,virtual=on
 \
 -device cxl-upstream,port=0,sn=5678,bus=root_port2,id=us1,addr=0.0,virtual=on \

#note I extended current target to a list
 -device cxl-virtual-downstream,vport=0,bus=us0,id=vppb0 \
 -device cxl-virtual-downstream,vport=1,bus=us0,id=vppb0 \
 -device cxl-virtual-downstream,vport=2,bus=us0,id=vppb0 \
 -device cxl-virtual-downstream,vport=0,bus=us1,id=vppb0 \
 -device cxl-virtual-downstream,vport=1,bus=us1,id=vppb0 \
 -device cxl-virtual-downstream,vport=2,bus=us1,id=vppb0 \
# Note more virtual ports than physical - likely common situation.
 -object cxl-switch,usps.0=usp0,usps.1=usp1,id=vsw0 \
#list of usps so we can navigate downwards from this.
 -device 
cxl-switch-mailbox-cci,id=swcci0,bus=root_por1,multifunction=on,target=vsw0\
# Maybe hang the unconnected physical dsps on a bus created by the cxl-switch?
 -device cxl-downstream,port=0,bus=vsw0,id=dsp0,slot=3 \
 -device cxl-downstream,port=1,bus=vsw0,id=dsp1,slot=4 \
 -device cxl-downstream,port=2,bus=vsw0,id=dsp2,slot=7 \
 -device cxl-downstream,port=3,bus=vsw0,id=dsp3,slot=8 \
#ideally a device but need to think where to hang it.
 -device 
cxl-type3,persistent-memdev=cxl-mem1,id=cxl-ep1,lsa=cxl-lsa1,sn=99,bus=dsp0 \
 -device 
cxl-type3,persistent-memdev=cxl-mem2,id=cxl-ep2,lsa=cxl-lsa2,sn=100,bus=dsp1 \
 -device 
cxl-type3,persistent-memdev=cxl-mem3,id=cxl-ep3,lsa=cxl-lsa3,sn=101,bus=dsp2 \
#note not all DSPs have anything on them.

Few reasons for this structure.
1) The unconnected physical port - we want to make sure physical hotplug works 
both
   when not associated with a VCS and when it is.
2) We need to be able to talk to EPs via FM interfaces when they aren't 
connected
   Given we have to make that look like PCI, let's make it PCI.  I'm not sure 
how
   much hackery that will take as we'll need to do some level of enumeration 
from
   the the switch controller.  Only need that once we want to do more than check
   training etc though - so maybe job for another day.  In theory we can do 
everything
   with devices in that state (be it slowly) so would need all the addresses 
programmed
   etc. Not as general as current discussions on enumerating full PCI bus in 
QEMU as
   all direct connect.

Anyhow it's fiddly with this scheme but I think a little more general
than your current one and closer representation of the hardware which will
matter as we add all the introspection stuff etc in the FMAPI.

Jonathan



> +
>  References
>  ----------
>  


Reply via email to