Comment below:
> -----Original Message----- > From: Paolo Bonzini [mailto:pbonz...@redhat.com] > Sent: Friday, August 16, 2019 12:21 AM > To: Laszlo Ersek <ler...@redhat.com>; de...@edk2.groups.io; Yao, Jiewen > <jiewen....@intel.com> > Cc: edk2-rfc-groups-io <r...@edk2.groups.io>; qemu devel list > <qemu-devel@nongnu.org>; Igor Mammedov <imamm...@redhat.com>; > Chen, Yingwen <yingwen.c...@intel.com>; Nakajima, Jun > <jun.nakaj...@intel.com>; Boris Ostrovsky <boris.ostrov...@oracle.com>; > Joao Marcal Lemos Martins <joao.m.mart...@oracle.com>; Phillip Goerl > <phillip.go...@oracle.com> > Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF > > On 15/08/19 17:00, Laszlo Ersek wrote: > > On 08/14/19 16:04, Paolo Bonzini wrote: > >> On 14/08/19 15:20, Yao, Jiewen wrote: > >>>> - Does this part require a new branch somewhere in the OVMF SEC > code? > >>>> How do we determine whether the CPU executing SEC is BSP or > >>>> hot-plugged AP? > >>> [Jiewen] I think this is blocked from hardware perspective, since the > >>> first > instruction. > >>> There are some hardware specific registers can be used to determine if > the CPU is new added. > >>> I don’t think this must be same as the real hardware. > >>> You are free to invent some registers in device model to be used in > OVMF hot plug driver. > >> > >> Yes, this would be a new operation mode for QEMU, that only applies to > >> hot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI, in > >> fact it doesn't reply to anything at all. > >> > >>>> - How do we tell the hot-plugged AP where to start execution? (I.e. > that > >>>> it should execute code at a particular pflash location.) > >>> [Jiewen] Same real mode reset vector at FFFF:FFF0. > >> > >> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in > >> QEMU. The AP does not start execution at all when it is unplugged, so > >> no cache-as-RAM etc. > >> > >> We only need to modify QEMU so that hot-plugged APIs do not reply to > >> INIT/SIPI/SMI. > >> > >>> I don’t think there is problem for real hardware, who always has CAR. > >>> Can QEMU provide some CPU specific space, such as MMIO region? > >> > >> Why is a CPU-specific region needed if every other processor is in SMM > >> and thus trusted. > > > > I was going through the steps Jiewen and Yingwen recommended. > > > > In step (02), the new CPU is expected to set up RAM access. In step > > (03), the new CPU, executing code from flash, is expected to "send board > > message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add > > message." For that action, the new CPU may need a stack (minimally if we > > want to use C function calls). > > > > Until step (03), there had been no word about any other (= pre-plugged) > > CPUs (more precisely, Jiewen even confirmed "No impact to other > > processors"), so I didn't assume that other CPUs had entered SMM. > > > > Paolo, I've attempted to read Jiewen's response, and yours, as carefully > > as I can. I'm still very confused. If you have a better understanding, > > could you please write up the 15-step process from the thread starter > > again, with all QEMU customizations applied? Such as, unnecessary steps > > removed, and platform specifics filled in. > > Sure. > > (01a) QEMU: create new CPU. The CPU already exists, but it does not > start running code until unparked by the CPU hotplug controller. > > (01b) QEMU: trigger SCI > > (02-03) no equivalent > > (04) Host CPU: (OS) execute GPE handler from DSDT > > (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU > will not enter CPU because SMI is disabled) > > (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM > rebase code. > > (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable > new CPU > > (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU. [Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is no restriction that INIT/SIPI/SIPI can only be sent in SMM. > (08a) New CPU: (Low RAM) Enter protected mode. [Jiewen] NOTE: The new CPU still cannot use any physical memory, because the INIT/SIPI/SIPI may be sent by malicious CPU in non-SMM environment. > (08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop. > > (09) Host CPU: (SMM) Send SMI to the new CPU only. > > (10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to > TSEG. > > (11) Host CPU: (SMM) Restore 38000. > > (12) Host CPU: (SMM) Update located data structure to add the new CPU > information. (This step will involve CPU_SERVICE protocol) > > (13) New CPU: (Flash) do whatever other initialization is needed > > (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI. > > (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in.. > > > In other words, the cache-as-RAM phase of 02-03 is replaced by the > INIT-SIPI-SIPI sequence of 07b-08a-08b. [Jiewen] I am OK with this proposal. I think the rule is same - the new CPU CANNOT touch any system memory, no matter it is from reset-vector or from INIT/SIPI/SIPI. Or I would say: if the new CPU want to touch some memory before first SMI, the memory should be CPU specific or on the flash. > >> The QEMU DSDT could be modified (when secure boot is in effect) to OUT > >> to 0xB2 when hotplug happens. It could write a well-known value to > >> 0xB2, to be read by an SMI handler in edk2. > > > > I dislike involving QEMU's generated DSDT in anything SMM (even > > injecting the SMI), because the AML interpreter runs in the OS. > > > > If a malicious OS kernel is a bit too enlightened about the DSDT, it > > could willfully diverge from the process that we design. If QEMU > > broadcast the SMI internally, the guest OS could not interfere with that. > > > > If the purpose of the SMI is specifically to force all CPUs into SMM > > (and thereby force them into trusted state), then the OS would be > > explicitly counter-interested in carrying out the AML operations from > > QEMU's DSDT. > > But since the hotplug controller would only be accessible from SMM, > there would be no other way to invoke it than to follow the DSDT's > instruction and write to 0xB2. FWIW, real hardware also has plenty of > 0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store > access). > > Paolo