On 1/25/24 10:43, Jan Beulich wrote:
> On 09.01.2024 22:51, Stewart Hildebrand wrote:
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -168,6 +168,9 @@ static void modify_decoding(const struct pci_dev *pdev, 
>> uint16_t cmd,
>>      if ( !rom_only )
>>      {
>>          pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>> +        /* Show DomU that we updated P2M */
>> +        header->guest_cmd &= ~PCI_COMMAND_MEMORY;
>> +        header->guest_cmd |= cmd & PCI_COMMAND_MEMORY;
>>          header->bars_mapped = map;
>>      }
> 
> I don't follow what the comment means to say. The bit in question has no
> real connection to the P2M, and the guest also may have no notion of the
> underlying hypervisor's internals. Likely connected to ...

Indeed. If the comment survives to v13, I'll update it to:

        /* Now that we updated P2M, show DomU change to PCI_COMMAND_MEMORY */

> 
>> @@ -524,9 +527,26 @@ static void cf_check cmd_write(
>>  {
>>      struct vpci_header *header = data;
>>  
>> +    if ( !is_hardware_domain(pdev->domain) )
>> +    {
>> +        const struct vpci *vpci = pdev->vpci;
>> +
>> +        if ( (vpci->msi && vpci->msi->enabled) ||
>> +             (vpci->msix && vpci->msix->enabled) )
>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
>> +
>> +        /*
>> +         * Do not show change to PCI_COMMAND_MEMORY bit until we finish
>> +         * modifying P2M mappings.
>> +         */
>> +        header->guest_cmd = (cmd & ~PCI_COMMAND_MEMORY) |
>> +                            (header->guest_cmd & PCI_COMMAND_MEMORY);
>> +    }
> 
> ... the comment here, but then shouldn't it be that the guest can't even
> issue a 2nd cfg space access until the present write has been carried out?
> Otherwise I'd be inclined to claim that such a partial update is unlikely
> to be spec-conformant.

Due to the raise_softirq() call added in

  3e568fa9e19c ("vpci: fix deferral of long operations")

my current understanding is: when the guest toggles memory decoding, the guest 
vcpu doesn't resume execution until vpci_process_pending() and 
modify_decoding() have finished. So I think the guest should see a consistent 
state of the register, unless it was trying to read from a different vcpu than 
the one doing the writing.

Regardless, if the guest did have an opportunity to successfully read the 
partially updated state of the register, I'm not really spotting what part of 
the spec that would be a violation of. PCIe 6.1 has this description regarding 
the bit: "When this bit is Set" and "When this bit is Clear" the device will 
decode (or not) memory accesses. The spec doesn't seem to distinguish whether 
the host or the device itself is the one to set/clear the bit. One might even 
try to argue the opposite: allowing the bit to be toggled before the device 
reflects the change would be a violation of spec. Since the spec is ambiguous 
in this regard, I don't think either argument is particularly strong.

Chesterton's fence: the logic for deferring the update of PCI_COMMAND_MEMORY in 
guest_cmd was added between v10 and v11 of this series. I went back to look at 
the review comments on v10 [1], but the rationale is still not entirely clear 
to me. At the end of the day, with the information I have at hand, I suspect it 
would be fine either way (whether updating guest_cmd is deferred or not). If no 
other info comes to light, I'm leaning toward not deferring because it would be 
simpler to update the bit right away in cmd_write().

[1] https://lore.kernel.org/xen-devel/ZVy73iJ3E8nJHvgf@macbook.local/

> 
>> @@ -843,6 +885,15 @@ static int cf_check init_header(struct pci_dev *pdev)
>>      if ( cmd & PCI_COMMAND_MEMORY )
>>          pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & 
>> ~PCI_COMMAND_MEMORY);
>>  
>> +    /*
>> +     * Clear PCI_COMMAND_MEMORY and PCI_COMMAND_IO for DomUs, so they will
>> +     * always start with memory decoding disabled and to ensure that we 
>> will not
>> +     * call modify_bars() at the end of this function.
>> +     */
>> +    if ( !is_hwdom )
>> +        cmd &= ~(PCI_COMMAND_MEMORY | PCI_COMMAND_IO);
>> +    header->guest_cmd = cmd;
> 
> With PCI_COMMAND_MEMORY clear, the hw reg won't further be written on the
> success return path. Yet wouldn't we better clear PCI_COMMAND_IO also in
> hardware (until we properly support it)?

Yes, I'll clear PCI_COMMAND_IO in hardware too

> 
> I also think the insertion point for the new code isn't well chosen: The
> comment just out of context indicates that the code in context above is
> connected to the subsequent code. Whereas the addition is not.

I'll rearrange it

> 
>> --- a/xen/drivers/vpci/msi.c
>> +++ b/xen/drivers/vpci/msi.c
>> @@ -70,6 +70,15 @@ static void cf_check control_write(
>>  
>>          if ( vpci_msi_arch_enable(msi, pdev, vectors) )
>>              return;
>> +
>> +        /*
>> +         * Make sure domU doesn't enable INTx while enabling MSI.
>> +         */
> 
> Nit: This ought to be a single line comment, just like ...

OK, I'll make it a single line

> 
>> +        if ( !is_hardware_domain(pdev->domain) )
>> +        {
>> +            pci_intx(pdev, false);
>> +            pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE;
>> +        }
>>      }
>>      else
>>          vpci_msi_arch_disable(msi, pdev);
>> --- a/xen/drivers/vpci/msix.c
>> +++ b/xen/drivers/vpci/msix.c
>> @@ -135,6 +135,13 @@ static void cf_check control_write(
>>          }
>>      }
>>  
>> +    /* Make sure domU doesn't enable INTx while enabling MSI-X. */
>> +    if ( new_enabled && !msix->enabled && !is_hardware_domain(pdev->domain) 
>> )
>> +    {
>> +        pci_intx(pdev, false);
>> +        pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE;
>> +    }
> 
> ... the similar code here has it.
> 
> In both cases, is it really appropriate to set the bit in guest view?

I added this based on Roger's comment at [2]. Roger, what do you think? I don't 
believe QEMU updates the guest view in this manner.

[2] 
https://lore.kernel.org/xen-devel/ZLqI65gmNj1XDBm4@MacBook-Air-de-Roger.local/

Reply via email to