On 2015-03-05 08:08, Gerd Hoffmann wrote:
On Mi, 2015-03-04 at 19:12 +0000, Gordan Bobic wrote:
On 2015-03-04 13:20, Gerd Hoffmann wrote:
> On Di, 2015-03-03 at 10:32 +0000, Gordan Bobic wrote:
>> I need to pass a custom e820 map to a virtual machine for
>> troubleshooting purposes and working around IOMMU hardware
>> bugs.
>>
>> I have found references to a custom map being providable
>> via an external file, mentioned as "etc/e820" and "fw_cfg".
>
> That is the (filesystem-like) interface between qemu and firmware
> (seabios usually), it doesn't refer to a on-disk file.
>
>> Unfortunately, I have not found any documentation that
>> explains how to use this from userspace when invoking
>> qemu.
>
> You can't.
>
> Passing a different e820 map requires patching qemu (or seabios, which
> mangles the e820 table to add reservations for acpi etc).
>
> What exactly do you need?

Thank you for responding. The situation I have is that my PCIe
bridges are buggy and they seem to bypass the upstream PCIe hub
IOMMU. The problem with this is that when the guest accesses
RAM within it's emulated address space that overlaps with
PCI I/O memory ranges in the host's address space, what should
have ended up in RAM in the guest ends up trampling over the
IOMEM on the host.

The iommu isn't involved here at all.  When the pci devices are
accessing host ram via busmaster dma, *this* goes through the iommu.
And unless you are trying to use pci device assignment the iommu should
not matter at all.

I am using PCI device assignment. I'm passing a PCI devices to the
guest VM.

What you describe sounds more like a bug in ept/ntp/softmmu (either
kernel driver or hardware).  What machine is this?  Intel?  Has it ept?
What happens if you turn off ept?

It's an EVGA SR-2 (Intel Nehalem, 5520 NB), and I am 99% certain
the problem is related to the Nvidia NF200 PCIe multiplexer bridges.
Similar problems seem to have been reported by other people with
different motherboards that have NF200 bridges. The workaround is
usually to put the passthrough GPU on a slot that isn't behind the
NF200, but in my case that is not possible because all 7 PCIe slots
are behind the NF200 bridges.

I'd also suggest to go to the kvm list with this issue.

I'm pretty sure I am dealing with a hardware bug here. I have
a workaround that I know works (mark the host's IOMEM areas
as reserved) - I just need a way to get QEMU to adjust the
exposed e820 map accordingly. I will try disabling EPT and
see if that helps, but my understanding is that there is a
hefty penalty involved, which wouldn't be incurred if I
were to simply have reserved holes in the memory at the
appropriate ranges, hence why the latter solution would
be greatly preferable.

My bodge test patch for Xen's hvmloader simply marked the
entire memory range between the first and last IOMEM mapped
address on the host as reserved (essentially everything between
1.5GB and 4GB). This results in a fully working system but
because this wasn't plumbed in everywhere else it needs to be
plumbed in, the net result is that up to 2.5GB of RAM go missing
in each VM (i.e. memory is marked as reserved but not being made
into a hole - like I said it was a quick and dirty bodge to prove
that it would fix the problem).

I am currently using OVMF for the guest, and I have a bootable
system (Windows 7 guest) that works OK initially, but any access
to the indirect BARs (as soon as anything requiring DirectX
happens) results in the entire host locking up solid (I suspect
that one of the virtual BARs overlaps a physical BAR).

The question is - if a convenient hook for e820 reservations
functionality does not currently exist, would the best place to
add the e820 reservations be to patch it into QEMU, or OVMF/EDK2,
or somewhere else entirely?

Gordan

Reply via email to