Re: [qubes-devel] PCI passthrough

Jean-Philippe Ouellet Tue, 13 Nov 2018 18:03:01 -0800

On Tue, Nov 13, 2018 at 11:20 AM, Radoslaw Szkodzinski
<astralst...@gmail.com> wrote:
> On Tue, Nov 13, 2018 at 12:02 PM John Mitchell <sonwo...@gmail.com> wrote:
>> On Tuesday, November 13, 2018 at 5:01:06 AM UTC+1, Radosław Szkodziński 
>> wrote:
>> > On Thu, Nov 8, 2018 at 11:48 AM Marek Marczykowski-Górecki
>> > <marmarek@<snip>> wrote:
>> > > Yes, the main problem with GPU passthrough is complexity - those devices
>> > > abuse PCI specification in so many ways you wouldn't believe... There
>> > > are some workarounds, but most require qemu running in dom0, which is a
>> > > security issue.
>> >
>> > The main actual problem, considering you would never keep the passed
>> > through GPU ever in dom0 (always in vfio-pci), is how to reset the
>> > device.
>> > There are known bugs in Vega and Polaris implementation of PCIe reset
>> > so they do not reset right - and additionally some mainboards do not
>> > reprogram PCIe configuration space for secondary bridges.
>> > Typically the GPU is passed in as a secondary card - skipping the
>> > whole VGA passthrough thing.
>> > With older driver versions, you had to also specify a ROM file
>> > (containing video rom dump), but latest ones do not require it as they
>> > have a fallback interface to access AtomBIOS.
>> >
>> > nVidia on the other hand strictly requires as much hiding of the VM as
>> > possible unless you own a Quadro and always have to add the video ROM,
>> > but if it's successful it just works.
>> > Passing the GPU between VM instances requires belief that device reset
>> > for such a complex device has no data leaks...
>> >
>> > On top of that, you have the problem that some IOMMU implementations
>> > do not reset devices fully either.
>> > Passing the GPU between VM instances securely requires belief that
>> > device reset for such a complex device has no data leaks...
>> > If you will never remove the GPU from VM nor stop the VM using it,
>> > this shouldn't be a problem - but ensuring that in Windows world is
>> > pretty hard with those silly automatic updates.
>> > GPU VM, always on, never reset, with some sort of a proxy could avoid
>> > this issue. I think Looking Glass project has some reasonable software
>> > that can be investigated for this.
>> >
>> > R.
>>
>> Thank you for the informative reply.
>>
>> It appears the Vega reset bug is related to the VM presenting the card as a 
>> legacy PCI device rather than a PCIe device.
>
> AMD has said that Vega needs PSP reset issued before disabling IOMMU
> otherwise its internal bus will be all sorts of messed up and
> unavailable.
> Alternatively you can quirk this device to not be reset and put in D3
> at all, then it will be reset by the driver on next guest boot, but
> that has some security implications.
>
> I'm not entirely sure what is currently wrong with the IOMMU on my TR4
> X399, but 4.17 works, while 4.18+ kernel is a total failure on reset
> even with new AGESA which does attempt to rewrite PCIe option ROM.
> I suspect this can fix it:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c120143f584360a13614787e23ae2cdcb5e5ccd
> If not I'll go bisecting.
>
> Card is also messed up if passed through even after S3, also the above
> is supsected. S3 actually fully resets the GPU otherwise as tested
> when the GPU is booted by EFI, in which case you can use S3 to reset
> it; but not if you IOMMU pass it somewhere. Likely by the same
> TLB/IOVA handling bug - TLB/IOVA cannot be touched in S3 resume.
> At least 4.20-rc got some real debugfs for AMD IOMMU which will help
> finding out what's wrong.
>
>> Looking Glass appears to be Alpha stage.
>
> Yes, and it is not quite right way around - allows to show Windows
> headless in Linux or Windows target. Mostly useful to not have a
> monitor cable run to the GPU.
> It seems to be mostly doing what Qubes driver is doing? Works well
> enough if you want the VM visible in a window or fullscreen on another
> GPU.
> Perhaps even if the passed through GPU is a laptop secondary as it
> needs no output mapping and thus neither framebuffer mapping nor
> hardware output mux and vga switcheroo.
>
> Another way would be to use Virgil 3D and write a virtual Windows
> driver for it or extend the Qubes one. https://virgil3d.github.io/
> Make sure to still put the GPU and vfio-virgl in a VM for any
> security, scrape the GPU framebuffer from there and redisplay it
> safely in host after improving Qubes framebuffer proxy to handle vsync
> better.
> That option would run the GPU code in Linux VM after proxying.
> Relatively high potential for bugs and issues though as shader
> implementation will likely have compatibility issues... Might be
> harder to exploit as it's a sort of a proxy.


I would be very happy to help fund development in this area if anyone
qualified & serious is interested.

Regards,
Jean-Philippe

-- 
You received this message because you are subscribed to the Google Groups 
"qubes-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to qubes-devel+unsubscr...@googlegroups.com.
To post to this group, send email to qubes-devel@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-devel/CABQWM_AZ2kp4%2B8tJ%2Bc2oZGY6ZUe1Drv4O4mE6PtYoTWwLVkZSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [qubes-devel] PCI passthrough

Reply via email to