> -----Original Message----- > From: Alex Williamson [mailto:alex.william...@redhat.com] > Sent: Saturday, August 17, 2019 6:20 AM > To: Laszlo Ersek <ler...@redhat.com> > Cc: Yao, Jiewen <jiewen....@intel.com>; Paolo Bonzini > <pbonz...@redhat.com>; de...@edk2.groups.io; edk2-rfc-groups-io > <r...@edk2.groups.io>; qemu devel list <qemu-devel@nongnu.org>; Igor > Mammedov <imamm...@redhat.com>; Chen, Yingwen > <yingwen.c...@intel.com>; Nakajima, Jun <jun.nakaj...@intel.com>; Boris > Ostrovsky <boris.ostrov...@oracle.com>; Joao Marcal Lemos Martins > <joao.m.mart...@oracle.com>; Phillip Goerl <phillip.go...@oracle.com> > Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF > > On Fri, 16 Aug 2019 22:15:15 +0200 > Laszlo Ersek <ler...@redhat.com> wrote: > > > +Alex (direct question at the bottom) > > > > On 08/16/19 09:49, Yao, Jiewen wrote: > > > below > > > > > >> -----Original Message----- > > >> From: Paolo Bonzini [mailto:pbonz...@redhat.com] > > >> Sent: Friday, August 16, 2019 3:20 PM > > >> To: Yao, Jiewen <jiewen....@intel.com>; Laszlo Ersek > > >> <ler...@redhat.com>; de...@edk2.groups.io > > >> Cc: edk2-rfc-groups-io <r...@edk2.groups.io>; qemu devel list > > >> <qemu-devel@nongnu.org>; Igor Mammedov > <imamm...@redhat.com>; > > >> Chen, Yingwen <yingwen.c...@intel.com>; Nakajima, Jun > > >> <jun.nakaj...@intel.com>; Boris Ostrovsky > <boris.ostrov...@oracle.com>; > > >> Joao Marcal Lemos Martins <joao.m.mart...@oracle.com>; Phillip > Goerl > > >> <phillip.go...@oracle.com> > > >> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF > > >> > > >> On 16/08/19 04:46, Yao, Jiewen wrote: > > >>> Comment below: > > >>> > > >>> > > >>>> -----Original Message----- > > >>>> From: Paolo Bonzini [mailto:pbonz...@redhat.com] > > >>>> Sent: Friday, August 16, 2019 12:21 AM > > >>>> To: Laszlo Ersek <ler...@redhat.com>; de...@edk2.groups.io; Yao, > > >> Jiewen > > >>>> <jiewen....@intel.com> > > >>>> Cc: edk2-rfc-groups-io <r...@edk2.groups.io>; qemu devel list > > >>>> <qemu-devel@nongnu.org>; Igor Mammedov > > >> <imamm...@redhat.com>; > > >>>> Chen, Yingwen <yingwen.c...@intel.com>; Nakajima, Jun > > >>>> <jun.nakaj...@intel.com>; Boris Ostrovsky > > >> <boris.ostrov...@oracle.com>; > > >>>> Joao Marcal Lemos Martins <joao.m.mart...@oracle.com>; Phillip > Goerl > > >>>> <phillip.go...@oracle.com> > > >>>> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF > > >>>> > > >>>> On 15/08/19 17:00, Laszlo Ersek wrote: > > >>>>> On 08/14/19 16:04, Paolo Bonzini wrote: > > >>>>>> On 14/08/19 15:20, Yao, Jiewen wrote: > > >>>>>>>> - Does this part require a new branch somewhere in the OVMF > SEC > > >>>> code? > > >>>>>>>> How do we determine whether the CPU executing SEC is BSP > or > > >>>>>>>> hot-plugged AP? > > >>>>>>> [Jiewen] I think this is blocked from hardware perspective, since > the > > >> first > > >>>> instruction. > > >>>>>>> There are some hardware specific registers can be used to > determine > > >> if > > >>>> the CPU is new added. > > >>>>>>> I don’t think this must be same as the real hardware. > > >>>>>>> You are free to invent some registers in device model to be used > in > > >>>> OVMF hot plug driver. > > >>>>>> > > >>>>>> Yes, this would be a new operation mode for QEMU, that only > applies > > >> to > > >>>>>> hot-plugged CPUs. In this mode the AP doesn't reply to INIT or > SMI, > > >> in > > >>>>>> fact it doesn't reply to anything at all. > > >>>>>> > > >>>>>>>> - How do we tell the hot-plugged AP where to start execution? > (I.e. > > >>>> that > > >>>>>>>> it should execute code at a particular pflash location.) > > >>>>>>> [Jiewen] Same real mode reset vector at FFFF:FFF0. > > >>>>>> > > >>>>>> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in > > >>>>>> QEMU. The AP does not start execution at all when it is > unplugged, > > >> so > > >>>>>> no cache-as-RAM etc. > > >>>>>> > > >>>>>> We only need to modify QEMU so that hot-plugged APIs do not > reply > > >> to > > >>>>>> INIT/SIPI/SMI. > > >>>>>> > > >>>>>>> I don’t think there is problem for real hardware, who always has > CAR. > > >>>>>>> Can QEMU provide some CPU specific space, such as MMIO > region? > > >>>>>> > > >>>>>> Why is a CPU-specific region needed if every other processor is in > SMM > > >>>>>> and thus trusted. > > >>>>> > > >>>>> I was going through the steps Jiewen and Yingwen recommended. > > >>>>> > > >>>>> In step (02), the new CPU is expected to set up RAM access. In step > > >>>>> (03), the new CPU, executing code from flash, is expected to "send > > >> board > > >>>>> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add > > >>>>> message." For that action, the new CPU may need a stack > (minimally if > > >> we > > >>>>> want to use C function calls). > > >>>>> > > >>>>> Until step (03), there had been no word about any other (= > pre-plugged) > > >>>>> CPUs (more precisely, Jiewen even confirmed "No impact to other > > >>>>> processors"), so I didn't assume that other CPUs had entered SMM. > > >>>>> > > >>>>> Paolo, I've attempted to read Jiewen's response, and yours, as > carefully > > >>>>> as I can. I'm still very confused. If you have a better understanding, > > >>>>> could you please write up the 15-step process from the thread > starter > > >>>>> again, with all QEMU customizations applied? Such as, unnecessary > > >> steps > > >>>>> removed, and platform specifics filled in. > > >>>> > > >>>> Sure. > > >>>> > > >>>> (01a) QEMU: create new CPU. The CPU already exists, but it does > not > > >>>> start running code until unparked by the CPU hotplug > controller. > > >>>> > > >>>> (01b) QEMU: trigger SCI > > >>>> > > >>>> (02-03) no equivalent > > >>>> > > >>>> (04) Host CPU: (OS) execute GPE handler from DSDT > > >>>> > > >>>> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New > CPU > > >>>> will not enter CPU because SMI is disabled) > > >>>> > > >>>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM > > >>>> rebase code. > > >>>> > > >>>> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable > > >>>> new CPU > > >>>> > > >>>> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU. > > >>> [Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is > no > > >>> restriction that INIT/SIPI/SIPI can only be sent in SMM. > > >> > > >> All of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded > > >> before 07a, so this is okay. > > > [Jiewen] May I know why INIT/SIPI/SIPI is discarded before 07a but is > delivered at 07a? > > > I don’t see any extra step between 06 and 07a. > > > What is the magic here? > > > > The magic is 07a itself, IIUC. The CPU hotplug controller would be > > accessible only in SMM. And until 07a happens, the new CPU ignores > > INIT/SIPI/SIPI even if another CPU sends it those, simply because QEMU > > would implement the new CPU's behavior like that. [Jiewen] Got it. Looks fine to me.
> > >> However I do see a problem, because a PCI device's DMA could > overwrite > > >> 0x38000 between (06) and (10) and hijack the code that is executed in > > >> SMM. How is this avoided on real hardware? By the time the new > CPU > > >> enters SMM, it doesn't run off cache-as-RAM anymore. > > > [Jiewen] Interesting question. > > > I don’t think the DMA attack is considered in threat model for the virtual > environment. We only list adversary below: > > > -- Adversary: System Software Attacker, who can control any OS memory > or silicon register from OS level, or read write BIOS data. > > > -- Adversary: Simple hardware attacker, who can hot add or hot remove > a CPU. > > > > We do have physical PCI(e) device assignment; sorry for not highlighting > > that earlier. [Jiewen] That is OK. Then we MUST add the third adversary. -- Adversary: Simple hardware attacker, who can use device to perform DMA attack in the virtual world. NOTE: The DMA attack in the real world is out of scope. That is be handled by IOMMU in the real world, such as VTd. -- Please do clarify if this is TRUE. In the real world: #1: the SMM MUST be non-DMA capable region. #2: the MMIO MUST be non-DMA capable region. #3: the stolen memory MIGHT be DMA capable region or non-DMA capable region. It depends upon the silicon design. #4: the normal OS accessible memory - including ACPI reclaim, ACPI NVS, and reserved memory not included by #3 - MUST be DMA capable region. As such, IOMMU protection is NOT required for #1 and #2. IOMMU protection MIGHT be required for #3 and MUST be required for #4. I assume the virtual environment is designed in the same way. Please correct me if I am wrong. >> That feature (VFIO) does rely on the (physical) IOMMU, and > > it makes sure that the assigned device can only access physical frames > > that belong to the virtual machine that the device is assigned to. [Jiewen] Thank you! Good to know. I found https://www.kernel.org/doc/Documentation/vfio.txt Is that what you scribed above? Anyway, I believe the problem is clear and the solution in real world is clear. I will leave the virtual world discussion to Alex, Paolo, Laszlo. If you need any of my input, please let me know. > > However, as far as I know, VFIO doesn't try to restrict PCI DMA to > > subsets of guest RAM... I could be wrong about that, I vaguely recall > > RMRR support, which seems somewhat related. > > > > > I agree it is a threat from real hardware perspective. SMM may check > VTd to make sure the 38000 is blocked. > > > I doubt if it is a threat in virtual environment. Do we have a way to > > > block > DMA in virtual environment? > > > > I think that would be a VFIO feature. > > > > Alex: if we wanted to block PCI(e) DMA to a specific part of guest RAM > > (expressed with guest-physical RAM addresses), perhaps permanently, > > perhaps just for a while -- not sure about coordination though --, could > > VFIO accommodate that (I guess by "punching holes" in the IOMMU page > > tables)? > > It depends. For starters, the vfio mapping API does not allow > unmapping arbitrary sub-ranges of previous mappings. So the hole you > want to punch would need to be independently mapped. From there you > get into the issue of whether this range is a potential DMA target. If > it is, then this is the path to data corruption. We cannot interfere > with the operation of the device and we have little to no visibility of > active DMA targets. > > If we're talking about RAM that is never a DMA target, perhaps e820 > reserved memory, then we can make sure certainly MemoryRegions are > skipped when mapped by QEMU and would expect the guest to never map > them through a vIOMMU as well. Maybe then it's a question of where > we're trying to provide security (it might be more difficult if QEMU > needs to sanitize vIOMMU mappings to actively prevent mapping > reserved areas). > > Is there anything unique about the VM case here? Bare metal SMM needs > to be concerned about protecting itself from I/O devices that operate > outside of the realm of SMM mode as well, right? Is something "simple" > like an AddressSpace switch necessary here, such that an I/O device > always has a mapping to a safe guest RAM page while the vCPU > AddressSpace can switch to some protected page? The IOMMU and vCPU > mappings don't need to be the same. The vCPU is more under our control > than the assigned device. > > FWIW, RMRRs are a VT-d specific mechanism to define an address range as > persistently, identity mapped for one or more devices. IOW, the device > would always map that range. I don't think that's what you're after > here. RMRRs are also an abomination that I hope we never find a > requirement for in a VM. Thanks, > > Alex