On 07/29/17 01:15, Michael S. Tsirkin wrote:
> On Thu, Jul 27, 2017 at 03:58:58PM +0200, Laszlo Ersek wrote:
>> On 07/27/17 11:39, Marcel Apfelbaum wrote:
>>> On 27/07/2017 2:28, Michael S. Tsirkin wrote:
>>>> On Thu, Jul 27, 2017 at 12:54:07AM +0300, Alexander Bezzubikov wrote:
>>>>> 2017-07-26 22:43 GMT+03:00 Michael S. Tsirkin <m...@redhat.com>:
>>>>>> On Sun, Jul 23, 2017 at 01:15:41AM +0300, Aleksandr Bezzubikov wrote:
>>>>>>> On PCI init PCI bridges may need some
>>>>>>> extra info about bus number to reserve, IO, memory and
>>>>>>> prefetchable memory limits. QEMU can provide this
>>>>>>> with special
>>>>>>
>>>>>> with a special
>>>>>>
>>>>>>> vendor-specific PCI capability.
>>>>>>>
>>>>>>> Sizes of limits match ones from
>>>>>>> PCI Type 1 Configuration Space Header,
>>>>>>> number of buses to reserve occupies only 1 byte
>>>>>>> since it is the size of Subordinate Bus Number register.
>>>>>>>
>>>>>>> Signed-off-by: Aleksandr Bezzubikov <zuban...@gmail.com>
>>>>>>> ---
>>>>>>>   hw/pci/pci_bridge.c         | 27 +++++++++++++++++++++++++++
>>>>>>>   include/hw/pci/pci_bridge.h | 18 ++++++++++++++++++
>>>>>>>   2 files changed, 45 insertions(+)
>>>>>>>
>>>>>>> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
>>>>>>> index 720119b..8ec6c2c 100644
>>>>>>> --- a/hw/pci/pci_bridge.c
>>>>>>> +++ b/hw/pci/pci_bridge.c
>>>>>>> @@ -408,6 +408,33 @@ void pci_bridge_map_irq(PCIBridge *br, const
>>>>>>> char* bus_name,
>>>>>>>       br->bus_name = bus_name;
>>>>>>>   }
>>>>>>>
>>>>>>> +
>>>>>>> +int pci_bridge_help_cap_init(PCIDevice *dev, int cap_offset,
>>>>>>
>>>>>> help? should be qemu_cap_init?
>>>>>>
>>>>>>> +                              uint8_t bus_reserve, uint32_t io_limit,
>>>>>>> +                              uint16_t mem_limit, uint64_t
>>>>>>> pref_limit,
>>>>>>> +                              Error **errp)
>>>>>>> +{
>>>>>>> +    size_t cap_len = sizeof(PCIBridgeQemuCap);
>>>>>>> +    PCIBridgeQemuCap cap;
>>>>>>
>>>>>> This leaks info to guest. You want to init all fields here:
>>>>>>
>>>>>> cap = {
>>>>>>   .len = ....
>>>>>> };
>>>>>
>>>>> I surely can do this for len field, but as Laszlo proposed
>>>>> we can use mutually exclusive fields,
>>>>> e.g. pref_32 and pref_64, the only way I have left
>>>>> is to use ternary operator (if we surely need this
>>>>> big initializer). Keeping some if's would look better,
>>>>> I think.
>>>>>
>>>>>>
>>>>>>> +
>>>>>>> +    cap.len = cap_len;
>>>>>>> +    cap.bus_res = bus_reserve;
>>>>>>> +    cap.io_lim = io_limit & 0xFF;
>>>>>>> +    cap.io_lim_upper = io_limit >> 8 & 0xFFFF;
>>>>>>> +    cap.mem_lim = mem_limit;
>>>>>>> +    cap.pref_lim = pref_limit & 0xFFFF;
>>>>>>> +    cap.pref_lim_upper = pref_limit >> 16 & 0xFFFFFFFF;
>>>>>>
>>>>>> Please use pci_set_word etc or cpu_to_leXX.
>>>>>>
>>>>>
>>>>> Since now we've decided to avoid fields separation into <field> +
>>>>> <field_upper>,
>>>>> this bitmask along with pci_set_word are no longer needed.
>>>>>
>>>>>> I think it's easiest to replace struct with a set of macros then
>>>>>> pci_set_word does the work for you.
>>>>>>
>>>>>
>>>>> I don't really want to use macros here because structure
>>>>> show us the whole capability layout and this can
>>>>> decrease documenting efforts. More than that,
>>>>> memcpy usage is very convenient here, and I wouldn't like
>>>>> to lose it.
>>>>>
>>>>>>
>>>>>>> +
>>>>>>> +    int offset = pci_add_capability(dev, PCI_CAP_ID_VNDR,
>>>>>>> +                                    cap_offset, cap_len, errp);
>>>>>>> +    if (offset < 0) {
>>>>>>> +        return offset;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    memcpy(dev->config + offset + 2, (char *)&cap + 2, cap_len - 2);
>>>>>>
>>>>>> +2 is yacky. See how virtio does it:
>>>>>>
>>>>>>      memcpy(dev->config + offset + PCI_CAP_FLAGS, &cap->cap_len,
>>>>>>             cap->cap_len - PCI_CAP_FLAGS);
>>>>>>
>>>>>>
>>>>>
>>>>> OK.
>>>>>
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>>   static const TypeInfo pci_bridge_type_info = {
>>>>>>>       .name = TYPE_PCI_BRIDGE,
>>>>>>>       .parent = TYPE_PCI_DEVICE,
>>>>>>> diff --git a/include/hw/pci/pci_bridge.h b/include/hw/pci/pci_bridge.h
>>>>>>> index ff7cbaa..c9f642c 100644
>>>>>>> --- a/include/hw/pci/pci_bridge.h
>>>>>>> +++ b/include/hw/pci/pci_bridge.h
>>>>>>> @@ -67,4 +67,22 @@ void pci_bridge_map_irq(PCIBridge *br, const
>>>>>>> char* bus_name,
>>>>>>>   #define  PCI_BRIDGE_CTL_DISCARD_STATUS       0x400   /* Discard
>>>>>>> timer status */
>>>>>>>   #define  PCI_BRIDGE_CTL_DISCARD_SERR 0x800   /* Discard timer
>>>>>>> SERR# enable */
>>>>>>>
>>>>>>> +typedef struct PCIBridgeQemuCap {
>>>>>>> +    uint8_t id;     /* Standard PCI capability header field */
>>>>>>> +    uint8_t next;   /* Standard PCI capability header field */
>>>>>>> +    uint8_t len;    /* Standard PCI vendor-specific capability
>>>>>>> header field */
>>>>>>> +    uint8_t bus_res;
>>>>>>> +    uint32_t pref_lim_upper;
>>>>>>
>>>>>> Big endian? Ugh.
>>>>>>
>>>>>
>>>>> Agreed, and this's gonna to disappear with
>>>>> the new layout.
>>>>>
>>>>>>> +    uint16_t pref_lim;
>>>>>>> +    uint16_t mem_lim;
>>>>>>
>>>>>> I'd say we need 64 bit for memory.
>>>>>>
>>>>>
>>>>> Why? Non-prefetchable MEMORY_LIMIT register is 16 bits long.
>>>>
>>>> Hmm ok, but e.g. for io there are bridges that have extra registers
>>>> to specify non-standard non-aligned registers.
>>>>
>>>>>>> +    uint16_t io_lim_upper;
>>>>>>> +    uint8_t io_lim;
>>>>>>> +    uint8_t padding;
>>>>>>
>>>>>> IMHO each type should have a special "don't care" flag
>>>>>> that would mean "I don't know".
>>>>>>
>>>>>>
>>>>>
>>>>> Don't know what? Now 0 is an indicator to do nothing with this field.
>>>>
>>>> In that case how do you say "don't allocate any memory"?
>>>>
>>>
>>> We can keep the MEM/Limit registers read-only for such cases,
>>> as they are optional registers.
>>
>> For OVMF, it would be really nice if OVMF could gather the reservation
>> numbers with
>> - read-only accesses
>> - to the one exact hotplug controller (root port, bridge, etc) that's
>>   being queried for reservation.
>>
>> Deciding whether some registers in config space are r/o would require
>> OVMF to attempt a write and check the result (and if it succeeded, to
>> restore the original value). I'm not too attracted to doing this in a
>> platform hook, while PciBusDxe is in the middle of enumerating /
>> configuring the PCI(e) hierarchy :)
>>
>> Thanks
>> Laszlo
> 
> Well that's the PCI spec, I don't think there's a choice for
> regular bridges. If we have this capability this is a possible
> optimization.
> But without it, if you assign ranges without checking whether they are
> available they won't have any effect practically.
> It's mostly harmless, but you are going to waste resources.
> 

OK. If the proposed solution with the r/o mem base/limit registers is
rooted in the spec (and I think it indeed must be; apparently this would
be the same as what we're already planning for IO disablement), then
that's a strong argument for PciBusDxe to accommodate this probing in
the platform hook.

Thanks
Laszlo

Reply via email to