On 06/03/2015 01:20 PM, Laszlo Ersek wrote:
> On 06/03/15 10:15, Marcel Apfelbaum wrote:
>> On 06/02/2015 07:25 PM, Laszlo Ersek wrote:
>>> On 06/02/15 17:04, Marcel Apfelbaum wrote:
>>>> Hi,
>>>>
>>>> The following series:
>>>>      - [Qemu-devel] [PATCH V8 00/17] hw/pc: implement multiple primary
>>>>        busses for pc machines
>>>>      - https://www.mail-archive.com/qemu-devel@nongnu.org/msg300089.html
>>>>        adds a PCI Expander Device to QEMU that exposes a new PCI root
>>>>        bus.
>>>
>>> (Let's tie this thread to the v7 question too:
>>>
>>> http://thread.gmane.org/gmane.comp.emulators.qemu/338583/focus=338599
>>> )
>>>
>>>> The PXB is a "light-weight" host bridge whose purpose is to enable
>>>> the main host bridge to support multiple PCI root buses.
>>>>
>>>> It does not have its own registers for configuration cycles, but is
>>>> snoops on main host bridge registers and it lives on the same PCI
>>>> segment.
>>>>
>>>> The device receives from the command line the bus number and expects
>>>> the firmware (bios/UEFI) to probe the bus for devices behind it and
>>>> configure them.
>>>>
>>>> My question is how can it be supported in edk2? Are there any
>>>> architecture limitations that will prevent it to work?
>>>>
>>>> My edk2/UEFI knowledge is rather limited, but I did see in the spec
>>>> that there is support for this kind of device:
>>>>
>>>>        13.1.1 PCI Root Bridge I/O Overview
>>>>        ...
>>>>        Depending on the chipset, a single
>>>>        EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL may abstract a portion of a PCI
>>>>        Segment, or an entire PCI Segment. A PCI Host Bridge may produce
>>>>        one or more PCI Root Bridges. When a PCI Host Bridge produces
>>>>        multiple PCI Root Bridges, it is possible to have more than one
>>>>        PCI Segment.
>>>>        ...
>>>>
>>>> It seems that multiple EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances for
>>>> the same PCI Host Bridge mapped into the same PCI Segment is the
>>>> answer. First instance belongs to the "main" host bridge and the other
>>>> to the PXBs.
>>>>
>>>> The open questions are of course how to assign resources (bus
>>>> numbers/IO/MEM) to the EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances.
>>>>
>>>> For the bus numbers I think that the PCI Host Bridge can scan the 0x0
>>>> - 0xff range and build incrementally the bus ranges.
>>>>
>>>> Regarding IO/MEM ranges I am still not sure. The way it is done in
>>>> SeaBIOS is that all devices behind PXB root bus are "considered" as
>>>> being behind bus 0 for resources allocation. Once the resources
>>>> allocation is done, each EFI_PCI_ROOT_BRIDGE gets the list of MEM/IO
>>>> ranges corresponding with the devices behind them.
>>>>
>>>> Any comments and suggestions would be greatly appreciated.
>>>> Thank you in advance,
>>>> Marcel
>>>
>>> I'm attaching a horrible patch (applies on top of edk2 SVN r17543, aka
>>> git commit d4848bb9df) that allows OVMF to recognize the e1000 NIC with
>>> the following QEMU command line:
>>>
>>> ISO=/mnt/data/isos/Fedora-Live-Xfce-x86_64-20-1.iso
>>> CODE=/home/virt-images/OVMF_CODE.fd
>>> TMPL=/home/virt-images/OVMF_VARS.fd
>>>
>>> cp $TMPL vars.fd
>>>
>>> qemu-system-x86_64 \
>>>     -m 2048 \
>>>     -M pc \
>>>     -enable-kvm \
>>>     -device qxl-vga \
>>>     -drive if=pflash,readonly,format=raw,file=$CODE \
>>>     -drive if=pflash,format=raw,file=vars.fd \
>>>     -drive id=cdrom,if=none,readonly,format=raw,file=$ISO \
>>>     -device virtio-scsi-pci,id=scsi0 \
>>>     -device scsi-cd,bus=scsi0.0,drive=cdrom,bootindex=0 \
>>>     -debugcon file:debug.log \
>>>     -global isa-debugcon.iobase=0x402 \
>>>     -device pxb,id=bridge1,bus_nr=128 \
>>>     -netdev user,id=netdev0,hostfwd=tcp:127.0.0.1:2222-:22 \
>>>     -device e1000,netdev=netdev0,bus=bridge1,addr=1 \
>>>     -monitor stdio
>>>
>>> With this hack in place, and using the above QEMU command line,
>>> "debug.log" bears witness to the PCI enumeration succeeding, and the
>>> "PCI" command in the UEFI shell lists the e1000 NIC.
>> Hi Laszlo,
>>
>> These are very good news, that means that the device *can work* with edk2.
>>
>>>
>>> I agree with your analysis that the way to support this QEMU feature in
>>> OVMF is to produce several EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances.
>>> Beyond that agreement, I must say that invalidating the assumption that
>>> "there is only one root bridge" breaks about everything in OVMF.
>>>
>>> Just skimming my hack-patch identifies the problems (or in some cases,
>>> questions):
>>>
>>> * The PCI host bridge driver under PcAtChipsetPkg needs to be cloned
>>> under OvmfPkg, and (as you say) the bus ranges need to be determined
>>> dynamically. On IRC you said that probing for device 0 on a bus is
>>> sufficient to see if the bus lives, but for now I'm unsure if this would
>>> be a layering violation or not for the UEFI protocols in question. Maybe
>>> not.
>>>
>>> * The bus ranges assigned to each "pxb" device (ie. root bridge) would
>>> have to be able to accommodate any subordinate buses enumerated off that
>>> root bridge. At least this is what PciRootBridgeEnumerator() in
>>> "MdeModulePkg/Bus/Pci/PciBusDxe/PciEnumerator.c" seems to require. I've
>>> got no clue how to size the bus ranges properly for the root bridges to
>>> satisfy this.
>>>
>>> * In fact, the bus range presented over the
>>> EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL.Configuration() function cannot be just
>>> a range. As far as I tested the PCI bus driver (see path above), it
>>> doesn't find anything if the range retrieved from
>>> EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL.Configuration() doesn't start *exactly*
>>> with a live bus. In other words, it doesn't look for sibling buses, or
>>> independent buses in the bus range exposed by the root bridge protocol.
>>> It looks for *one* bus *at* the start of the range (and subordinate
>>> buses hanging off of that). I have absolutely no clue why this is so,
>>> but it means that for each pxb found, one root bridge io protocol would
>>> have to be produced, and that proto should expose one bus range, with
>>> the low end matching exactly the bus number, and the high end enabling
>>> child buses to be enumerated.
>>>
>>> * In the attached hack, I'm splitting the pre-patch, static, IO & MMIO
>>> apertures in the middle. Maybe they could be the same shared ranges, as
>>> you say. I don't know.
>>>
>>> * In the OVMF BDS (boot device selection) code, we manually connect the
>>> only one root bridge. This would have to be made dynamic, to connect all
>>> of them. This connection basically amounts to "starting the enumeration".
>>>
>>> * The OVMF boot order processing code hardcodes PciRoot(0x0) in a bunch
>>> of device path matching logic. That would not be appropriate any longer.
>>> In fact the above command line should boot the fedora live CD, but it
>>> doesn't, and in the UEFI setup utility I cannot even browse the CD
>>> filesystem.
>>>
>>> * Gabriel wrote earlier some code for setting the INTx interrupt pin
>>> registers of all PCI devices in OVMF's BDS. That code breaks now, an
>>> assert is triggered ("PCI host bridge (00:00.0) should have no
>>> interrupts"). Not sure why this happens.
>>>
>>> * The UEFI device paths for the PCI root bridges (textually,
>>> PciRoot(0x0), PciRoot(0x1) etc) actually start with ACPI device path
>>> nodes. They consist of a PNP0A03 _HID and a numeric _UID. If my reading
>>> of the UEFI spec is correct, the _UIDs that OVMF would assign to these
>>> device path notes would have to match the *actual* ACPI payload that
>>> QEMU exports. The _UID assignment is now static (just a 0), and my
>>> hack-patch assigns a static 1 to the "other" root bridge's device path.
>>> This is not good. OVMF would either have to parse ACPI payload
>>> (horrible) or get the _UID<->pxb assignment via fw_cfg.
>>>
>>> That's all the carnage I can think of right now, but I'm sure this is
>>> just the tip of the iceberg. This would be a very large project, and
>>> QEMU might have to expose a lot more info over fw_cfg than it does now.
>>>
>>> In any case, the device model itself could be digestible for OVMF, based
>>> on the results of this hack.
>>
>>
>> Thanks a lot for your analysis.
>> Since I am new to edk2, I cannot take this project by myself, but if PCI
>> guys can come up with a plan or design, I'll be glad to implement it,
>> or at least to contribute to it.
>
> After sleeping on it :), I'd certainly like to find the time to
> collaborate on this myself.
Great!

  Maybe we can experiment some more; for
> example we could start by you explaining to me how exactly to probe for
> a root bus's presence (you mentioned device 0, but I'll need more than
> that).
Well, I lied. :)
I had a look now on seabios and it does the following:
- Receives using a fw_config file the number of extra root buses.
- It starts scanning from bus 0 to bus 0xff until it discovers all
   the extra root buses. The 'discovery' is "go over all bus's slots and
   probe for a non empty PCI header". If you find at least one device you
   just discovered a new PCI root bus.

I think that we can improve the fw_config file to pass the actually
bus numbers and not only the total. In this way should be relatively easy
for edk2 to handle the extra root buses.

>
> For the bus range allocation, here's an idea:
> - create a bitmap with 256 bits (32 bytes) with all bits zero
> - probe all root buses; whatever is found, flip its bit to 1
> - assuming N root buses were found, divide the number of remaining zero
>    bits with N. The quotient Q means how many subordinate buses each root
>    bus would be able to accommodate
> - for each root bus:
>    - create an ACPI bus range descriptor that includes only the root
>      bus's number
>    - pull out Q zero bits from the bitmap, from the left, flipping them
>      to one as you proceed
>    - for each zero bit pulled, try to append that bus number to the ACPI
>      bus range descriptor (simply bumping the end). If there's a
>      discontinuity, start a new ACPI bus range descriptor.
>
> This greedy algorithm would grant each root bus the same number of
> possible subordinate buses, could be implemented in linear time, and
> would keep the individual bus ranges "reasonably continuous" (ie. there
> should be a reasonably low number of ACPI bus range descriptors, per
> root bus).
>
> What do you think? This wouldn't be a very hard patch to write, and then
> we could experiment with various -device pxb,bus_nr=xxx parameters.

Well, it looks nice but I think that we can do something much simpler :)
Let's continue the above idea that QEMU passes to edk2 the *extra* root bus 
numbers
in ascending order for simplicity.
For example 8,16,32. From here you can derive that the bus ranges are:
0-7 host bridge 0
8-15 pxb root bridge 1
16-31 pxb root bridge 2
32-0xff pxb root bridge 3

BTW, this is the way, as far as I know, that the real hw divides the ranges.
Limitation:
   - How do you know you have enough bus numbers for a host bridge to cover
     all PCI-2-PCI bridges behind it? Let's say bus 0 has 10 bridges, 0-7 range 
is not enough.
Reasoning:
    -This is *hw vendor* issue, not firmware, in our case QEMU should check
     the ranges are enough before starting edk2.
In conclusion, this assumption does not break anything or gives as a big 
limitation.
And Seabios already assumes that... and QEMU is not going to break it.

>
> The MMIO and IO spaces I would just share between all of them; the
> allocations from those are delegated back to the host bridge / root
> bridge driver, and the current implementation seems sufficient -- it
> just assings blocks from the same big MMIO ( / IO) space downwards
Yes, this is how it should be done, I am happy that it already works that way.

Thanks,
Marcel

>
> Thanks
> Laszlo
>


------------------------------------------------------------------------------
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel

Reply via email to