On 06/03/15 10:15, Marcel Apfelbaum wrote:
> On 06/02/2015 07:25 PM, Laszlo Ersek wrote:
>> On 06/02/15 17:04, Marcel Apfelbaum wrote:
>>> Hi,
>>>
>>> The following series:
>>>     - [Qemu-devel] [PATCH V8 00/17] hw/pc: implement multiple primary
>>>       busses for pc machines
>>>     - https://www.mail-archive.com/qemu-devel@nongnu.org/msg300089.html
>>>       adds a PCI Expander Device to QEMU that exposes a new PCI root
>>>       bus.
>>
>> (Let's tie this thread to the v7 question too:
>>
>> http://thread.gmane.org/gmane.comp.emulators.qemu/338583/focus=338599
>> )
>>
>>> The PXB is a "light-weight" host bridge whose purpose is to enable
>>> the main host bridge to support multiple PCI root buses.
>>>
>>> It does not have its own registers for configuration cycles, but is
>>> snoops on main host bridge registers and it lives on the same PCI
>>> segment.
>>>
>>> The device receives from the command line the bus number and expects
>>> the firmware (bios/UEFI) to probe the bus for devices behind it and
>>> configure them.
>>>
>>> My question is how can it be supported in edk2? Are there any
>>> architecture limitations that will prevent it to work?
>>>
>>> My edk2/UEFI knowledge is rather limited, but I did see in the spec
>>> that there is support for this kind of device:
>>>
>>>       13.1.1 PCI Root Bridge I/O Overview
>>>       ...
>>>       Depending on the chipset, a single
>>>       EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL may abstract a portion of a PCI
>>>       Segment, or an entire PCI Segment. A PCI Host Bridge may produce
>>>       one or more PCI Root Bridges. When a PCI Host Bridge produces
>>>       multiple PCI Root Bridges, it is possible to have more than one
>>>       PCI Segment.
>>>       ...
>>>
>>> It seems that multiple EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances for
>>> the same PCI Host Bridge mapped into the same PCI Segment is the
>>> answer. First instance belongs to the "main" host bridge and the other
>>> to the PXBs.
>>>
>>> The open questions are of course how to assign resources (bus
>>> numbers/IO/MEM) to the EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances.
>>>
>>> For the bus numbers I think that the PCI Host Bridge can scan the 0x0
>>> - 0xff range and build incrementally the bus ranges.
>>>
>>> Regarding IO/MEM ranges I am still not sure. The way it is done in
>>> SeaBIOS is that all devices behind PXB root bus are "considered" as
>>> being behind bus 0 for resources allocation. Once the resources
>>> allocation is done, each EFI_PCI_ROOT_BRIDGE gets the list of MEM/IO
>>> ranges corresponding with the devices behind them.
>>>
>>> Any comments and suggestions would be greatly appreciated.
>>> Thank you in advance,
>>> Marcel
>>
>> I'm attaching a horrible patch (applies on top of edk2 SVN r17543, aka
>> git commit d4848bb9df) that allows OVMF to recognize the e1000 NIC with
>> the following QEMU command line:
>>
>> ISO=/mnt/data/isos/Fedora-Live-Xfce-x86_64-20-1.iso
>> CODE=/home/virt-images/OVMF_CODE.fd
>> TMPL=/home/virt-images/OVMF_VARS.fd
>>
>> cp $TMPL vars.fd
>>
>> qemu-system-x86_64 \
>>    -m 2048 \
>>    -M pc \
>>    -enable-kvm \
>>    -device qxl-vga \
>>    -drive if=pflash,readonly,format=raw,file=$CODE \
>>    -drive if=pflash,format=raw,file=vars.fd \
>>    -drive id=cdrom,if=none,readonly,format=raw,file=$ISO \
>>    -device virtio-scsi-pci,id=scsi0 \
>>    -device scsi-cd,bus=scsi0.0,drive=cdrom,bootindex=0 \
>>    -debugcon file:debug.log \
>>    -global isa-debugcon.iobase=0x402 \
>>    -device pxb,id=bridge1,bus_nr=128 \
>>    -netdev user,id=netdev0,hostfwd=tcp:127.0.0.1:2222-:22 \
>>    -device e1000,netdev=netdev0,bus=bridge1,addr=1 \
>>    -monitor stdio
>>
>> With this hack in place, and using the above QEMU command line,
>> "debug.log" bears witness to the PCI enumeration succeeding, and the
>> "PCI" command in the UEFI shell lists the e1000 NIC.
> Hi Laszlo,
> 
> These are very good news, that means that the device *can work* with edk2.
> 
>>
>> I agree with your analysis that the way to support this QEMU feature in
>> OVMF is to produce several EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances.
>> Beyond that agreement, I must say that invalidating the assumption that
>> "there is only one root bridge" breaks about everything in OVMF.
>>
>> Just skimming my hack-patch identifies the problems (or in some cases,
>> questions):
>>
>> * The PCI host bridge driver under PcAtChipsetPkg needs to be cloned
>> under OvmfPkg, and (as you say) the bus ranges need to be determined
>> dynamically. On IRC you said that probing for device 0 on a bus is
>> sufficient to see if the bus lives, but for now I'm unsure if this would
>> be a layering violation or not for the UEFI protocols in question. Maybe
>> not.
>>
>> * The bus ranges assigned to each "pxb" device (ie. root bridge) would
>> have to be able to accommodate any subordinate buses enumerated off that
>> root bridge. At least this is what PciRootBridgeEnumerator() in
>> "MdeModulePkg/Bus/Pci/PciBusDxe/PciEnumerator.c" seems to require. I've
>> got no clue how to size the bus ranges properly for the root bridges to
>> satisfy this.
>>
>> * In fact, the bus range presented over the
>> EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL.Configuration() function cannot be just
>> a range. As far as I tested the PCI bus driver (see path above), it
>> doesn't find anything if the range retrieved from
>> EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL.Configuration() doesn't start *exactly*
>> with a live bus. In other words, it doesn't look for sibling buses, or
>> independent buses in the bus range exposed by the root bridge protocol.
>> It looks for *one* bus *at* the start of the range (and subordinate
>> buses hanging off of that). I have absolutely no clue why this is so,
>> but it means that for each pxb found, one root bridge io protocol would
>> have to be produced, and that proto should expose one bus range, with
>> the low end matching exactly the bus number, and the high end enabling
>> child buses to be enumerated.
>>
>> * In the attached hack, I'm splitting the pre-patch, static, IO & MMIO
>> apertures in the middle. Maybe they could be the same shared ranges, as
>> you say. I don't know.
>>
>> * In the OVMF BDS (boot device selection) code, we manually connect the
>> only one root bridge. This would have to be made dynamic, to connect all
>> of them. This connection basically amounts to "starting the enumeration".
>>
>> * The OVMF boot order processing code hardcodes PciRoot(0x0) in a bunch
>> of device path matching logic. That would not be appropriate any longer.
>> In fact the above command line should boot the fedora live CD, but it
>> doesn't, and in the UEFI setup utility I cannot even browse the CD
>> filesystem.
>>
>> * Gabriel wrote earlier some code for setting the INTx interrupt pin
>> registers of all PCI devices in OVMF's BDS. That code breaks now, an
>> assert is triggered ("PCI host bridge (00:00.0) should have no
>> interrupts"). Not sure why this happens.
>>
>> * The UEFI device paths for the PCI root bridges (textually,
>> PciRoot(0x0), PciRoot(0x1) etc) actually start with ACPI device path
>> nodes. They consist of a PNP0A03 _HID and a numeric _UID. If my reading
>> of the UEFI spec is correct, the _UIDs that OVMF would assign to these
>> device path notes would have to match the *actual* ACPI payload that
>> QEMU exports. The _UID assignment is now static (just a 0), and my
>> hack-patch assigns a static 1 to the "other" root bridge's device path.
>> This is not good. OVMF would either have to parse ACPI payload
>> (horrible) or get the _UID<->pxb assignment via fw_cfg.
>>
>> That's all the carnage I can think of right now, but I'm sure this is
>> just the tip of the iceberg. This would be a very large project, and
>> QEMU might have to expose a lot more info over fw_cfg than it does now.
>>
>> In any case, the device model itself could be digestible for OVMF, based
>> on the results of this hack.
> 
> 
> Thanks a lot for your analysis.
> Since I am new to edk2, I cannot take this project by myself, but if PCI
> guys can come up with a plan or design, I'll be glad to implement it,
> or at least to contribute to it.

After sleeping on it :), I'd certainly like to find the time to
collaborate on this myself. Maybe we can experiment some more; for
example we could start by you explaining to me how exactly to probe for
a root bus's presence (you mentioned device 0, but I'll need more than
that).

For the bus range allocation, here's an idea:
- create a bitmap with 256 bits (32 bytes) with all bits zero
- probe all root buses; whatever is found, flip its bit to 1
- assuming N root buses were found, divide the number of remaining zero
  bits with N. The quotient Q means how many subordinate buses each root
  bus would be able to accommodate
- for each root bus:
  - create an ACPI bus range descriptor that includes only the root
    bus's number
  - pull out Q zero bits from the bitmap, from the left, flipping them
    to one as you proceed
  - for each zero bit pulled, try to append that bus number to the ACPI
    bus range descriptor (simply bumping the end). If there's a
    discontinuity, start a new ACPI bus range descriptor.

This greedy algorithm would grant each root bus the same number of
possible subordinate buses, could be implemented in linear time, and
would keep the individual bus ranges "reasonably continuous" (ie. there
should be a reasonably low number of ACPI bus range descriptors, per
root bus).

What do you think? This wouldn't be a very hard patch to write, and then
we could experiment with various -device pxb,bus_nr=xxx parameters.

The MMIO and IO spaces I would just share between all of them; the
allocations from those are delegated back to the host bridge / root
bridge driver, and the current implementation seems sufficient -- it
just assings blocks from the same big MMIO ( / IO) space downwards.

Thanks
Laszlo

------------------------------------------------------------------------------
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel

Reply via email to