I think you are overcomplicating things. Just try and get X running on just the AMD GPU on bare metal. Introducing virtualization is just adding more uncertainty. If you can't configure X to not use the integrated GPU, just blacklist the i915 driver (append modprobe.blacklist=i915 to the kernel command line in grub) and X should come up on the dGPU.
Alex On Wed, May 20, 2020 at 6:05 PM Javad Karabi <karabija...@gmail.com> wrote: > > Thanks Alex, > Here's my plan: > > since my laptop's os is pretty customized, e.g. compiling my own kernel, > building latest xorg, latest xorg-driver-amdgpu, etc etc, > im going to use the intel iommu and pass through my rx 5600 into a virtual > machine, which will be a 100% stock ubuntu installation. > then, inside that vm, i will continue to debug > > does that sound like it would make sense for testing? for example, with that > scenario, it adds the iommu into the mix, so who knows if that causes > performance issues. but i think its worth a shot, to see if a stock kernel > will handle it better > > also, quick question: > from what i understand, a thunderbolt 3 pci express connection should handle > 8 GT/s x4, however, along the chain of bridges to my device, i notice that > the bridge closest to the graphics card is at 2.5 GT/s x4, and it also says > "downgraded" (this is via the lspci output) > > now, when i boot into windows, it _also_ says 2.5 GT/s x4, and it runs > extremely well. no issues at all. > > so my question is: the fact that the bridge is at 2.5 GT/s x4, and not at its > theoretical "full link speed" of 8 GT/s x4, do you suppose that _could_ be an > issue? > i do not think so, because, like i said, in windows it also reports that link > speed. > i would assume that you would want the fastest link speed possible, because i > would assume that of _all_ tb3 pci express devices, a GPU would be the #1 > most demanding on the link > > just curious if you think 2.5 GT/s could be the bottleneck > > i will pass through the device into a ubuntu vm and let you know how it goes. > thanks > > > > On Tue, May 19, 2020 at 9:29 PM Alex Deucher <alexdeuc...@gmail.com> wrote: >> >> On Tue, May 19, 2020 at 9:16 PM Javad Karabi <karabija...@gmail.com> wrote: >> > >> > thanks for the answers alex. >> > >> > so, i went ahead and got a displayport cable to see if that changes >> > anything. and now, when i run monitor only, and the monitor connected >> > to the card, it has no issues like before! so i am thinking that >> > somethings up with either the hdmi cable, or some hdmi related setting >> > in my system? who knows, but im just gonna roll with only using >> > displayport cables now. >> > the previous hdmi cable was actually pretty long, because i was >> > extending it with an hdmi extension cable, so maybe the signal was >> > really bad or something :/ >> > >> > but yea, i guess the only real issue now is maybe something simple >> > related to some sysfs entry about enabling some powermode, voltage, >> > clock frequency, or something, so that glxgears will give me more than >> > 300 fps. but atleast now i can use a single monitor configuration with >> > the monitor displayported up to the card. >> > >> >> The GPU dynamically adjusts the clocks and voltages based on load. No >> manual configuration is required. >> >> At this point, we probably need to see you xorg log and dmesg output >> to try and figure out exactly what is going on. I still suspect there >> is some interaction going on with both GPUs and the integrated GPU >> being the primary, so as I mentioned before, you should try and run X >> on just the amdgpu rather than trying to use both of them. >> >> Alex >> >> >> > also, one other thing i think you might be interested in, that was >> > happening before. >> > >> > so, previously, with laptop -tb3-> egpu-hdmi> monitor, there was a >> > funny thing happening which i never could figure out. >> > when i would look at the X logs, i would see that "modesetting" (for >> > the intel integrated graphics) was reporting that MonitorA was used >> > with "eDP-1", which is correct and what i expected. >> > when i scrolled further down, i then saw that "HDMI-A-1-2" was being >> > used for another MonitorB, which also is what i expected (albeit i >> > have no idea why its saying A-1-2) >> > but amdgpu was _also_ saying that DisplayPort-1-2 (a port on the >> > radeon card) was being used for MonitorA, which is the same Monitor >> > that the modesetting driver had claimed to be using with eDP-1! >> > >> > so the point is that amdgpu was "using" Monitor0 with DisplayPort-1-2, >> > although that is what modesetting was using for eDP-1. >> > >> > anyway, thats a little aside, i doubt it was related to the terrible >> > hdmi experience i was getting, since its about display port and stuff, >> > but i thought id let you know about that. >> > >> > if you think that is a possible issue, im more than happy to plug the >> > hdmi setup back in and create an issue on gitlab with the logs and >> > everything >> > >> > On Tue, May 19, 2020 at 4:42 PM Alex Deucher <alexdeuc...@gmail.com> wrote: >> > > >> > > On Tue, May 19, 2020 at 5:22 PM Javad Karabi <karabija...@gmail.com> >> > > wrote: >> > > > >> > > > lol youre quick! >> > > > >> > > > "Windows has supported peer to peer DMA for years so it already has a >> > > > numbers of optimizations that are only now becoming possible on Linux" >> > > > >> > > > whoa, i figured linux would be ahead of windows when it comes to >> > > > things like that. but peer-to-peer dma is something that is only >> > > > recently possible on linux, but has been possible on windows? what >> > > > changed recently that allows for peer to peer dma in linux? >> > > > >> > > >> > > A few things that made this more complicated on Linux: >> > > 1. Linux uses IOMMUs more extensively than windows so you can't just >> > > pass around physical bus addresses. >> > > 2. Linux supports lots of strange architectures that have a lot of >> > > limitations with respect to peer to peer transactions >> > > >> > > It just took years to get all the necessary bits in place in Linux and >> > > make everyone happy. >> > > >> > > > also, in the context of a game running opengl on some gpu, is the >> > > > "peer-to-peer" dma transfer something like: the game draw's to some >> > > > memory it has allocated, then a DMA transfer gets that and moves it >> > > > into the graphics card output? >> > > >> > > Peer to peer DMA just lets devices access another devices local memory >> > > directly. So if you have a buffer in vram on one device, you can >> > > share that directly with another device rather than having to copy it >> > > to system memory first. For example, if you have two GPUs, you can >> > > have one of them copy it's content directly to a buffer in the other >> > > GPU's vram rather than having to go through system memory first. >> > > >> > > > >> > > > also, i know it can be super annoying trying to debug an issue like >> > > > this, with someone like me who has all types of differences from a >> > > > normal setup (e.g. using it via egpu, using a kernel with custom >> > > > configs and stuff) so as a token of my appreciation i donated 50$ to >> > > > the red cross' corona virus outbreak charity thing, on behalf of >> > > > amd-gfx. >> > > >> > > Thanks, >> > > >> > > Alex >> > > >> > > > >> > > > On Tue, May 19, 2020 at 4:13 PM Alex Deucher <alexdeuc...@gmail.com> >> > > > wrote: >> > > > > >> > > > > On Tue, May 19, 2020 at 3:44 PM Javad Karabi <karabija...@gmail.com> >> > > > > wrote: >> > > > > > >> > > > > > just a couple more questions: >> > > > > > >> > > > > > - based on what you are aware of, the technical details such as >> > > > > > "shared buffers go through system memory", and all that, do you see >> > > > > > any issues that might exist that i might be missing in my setup? i >> > > > > > cant imagine this being the case because the card works great in >> > > > > > windows, unless the windows driver does something different? >> > > > > > >> > > > > >> > > > > Windows has supported peer to peer DMA for years so it already has a >> > > > > numbers of optimizations that are only now becoming possible on >> > > > > Linux. >> > > > > >> > > > > > - as far as kernel config, is there anything in particular which >> > > > > > _should_ or _should not_ be enabled/disabled? >> > > > > >> > > > > You'll need the GPU drivers for your devices and dma-buf support. >> > > > > >> > > > > > >> > > > > > - does the vendor matter? for instance, this is an xfx card. when >> > > > > > it >> > > > > > comes to different vendors, are there interface changes that might >> > > > > > make one vendor work better for linux than another? i dont really >> > > > > > understand the differences in vendors, but i imagine that the vbios >> > > > > > differs between vendors, and as such, the linux compatibility would >> > > > > > maybe change? >> > > > > >> > > > > board vendor shouldn't matter. >> > > > > >> > > > > > >> > > > > > - is the pcie bandwidth possible an issue? the pcie_bw file changes >> > > > > > between values like this: >> > > > > > 18446683600662707640 18446744071581623085 128 >> > > > > > and sometimes i see this: >> > > > > > 4096 0 128 >> > > > > > as you can see, the second value seems significantly lower. is that >> > > > > > possibly an issue? possibly due to aspm? >> > > > > >> > > > > pcie_bw is not implemented for navi yet so you are just seeing >> > > > > uninitialized data. This patch set should clear that up. >> > > > > https://patchwork.freedesktop.org/patch/366262/ >> > > > > >> > > > > Alex >> > > > > >> > > > > > >> > > > > > On Tue, May 19, 2020 at 2:20 PM Javad Karabi >> > > > > > <karabija...@gmail.com> wrote: >> > > > > > > >> > > > > > > im using Driver "amdgpu" in my xorg conf >> > > > > > > >> > > > > > > how does one verify which gpu is the primary? im assuming my >> > > > > > > intel >> > > > > > > card is the primary, since i have not done anything to change >> > > > > > > that. >> > > > > > > >> > > > > > > also, if all shared buffers have to go through system memory, >> > > > > > > then >> > > > > > > that means an eGPU amdgpu wont work very well in general right? >> > > > > > > because going through system memory for the egpu means going >> > > > > > > over the >> > > > > > > thunderbolt connection >> > > > > > > >> > > > > > > and what are the shared buffers youre referring to? for example, >> > > > > > > if an >> > > > > > > application is drawing to a buffer, is that an example of a >> > > > > > > shared >> > > > > > > buffer that has to go through system memory? if so, thats fine, >> > > > > > > right? >> > > > > > > because the application's memory is in system memory, so that >> > > > > > > copy >> > > > > > > wouldnt be an issue. >> > > > > > > >> > > > > > > in general, do you think the "copy buffer across system memory >> > > > > > > might >> > > > > > > be a hindrance for thunderbolt? im trying to figure out which >> > > > > > > directions to go to debug and im totally lost, so maybe i can do >> > > > > > > some >> > > > > > > testing that direction? >> > > > > > > >> > > > > > > and for what its worth, when i turn the display "off" via the >> > > > > > > gnome >> > > > > > > display settings, its the same issue as when the laptop lid is >> > > > > > > closed, >> > > > > > > so unless the motherboard reads the "closed lid" the same as >> > > > > > > "display >> > > > > > > off", then im not sure if its thermal issues. >> > > > > > > >> > > > > > > On Tue, May 19, 2020 at 2:14 PM Alex Deucher >> > > > > > > <alexdeuc...@gmail.com> wrote: >> > > > > > > > >> > > > > > > > On Tue, May 19, 2020 at 2:59 PM Javad Karabi >> > > > > > > > <karabija...@gmail.com> wrote: >> > > > > > > > > >> > > > > > > > > given this setup: >> > > > > > > > > laptop -thunderbolt-> razer core x -> xfx rx 5600 xt raw 2 >> > > > > > > > > -hdmi-> monitor >> > > > > > > > > DRI_PRIME=1 glxgears gears gives me ~300fps >> > > > > > > > > >> > > > > > > > > given this setup: >> > > > > > > > > laptop -thunderbolt-> razer core x -> xfx rx 5600 xt raw 2 >> > > > > > > > > laptop -hdmi-> monitor >> > > > > > > > > >> > > > > > > > > glx gears gives me ~1800fps >> > > > > > > > > >> > > > > > > > > this doesnt make sense to me because i thought that having >> > > > > > > > > the monitor >> > > > > > > > > plugged directly into the card should give best performance. >> > > > > > > > > >> > > > > > > > >> > > > > > > > Do you have displays connected to both GPUs? If you are using >> > > > > > > > X which >> > > > > > > > ddx are you using? xf86-video-modesetting or >> > > > > > > > xf86-video-amdgpu? >> > > > > > > > IIRC, xf86-video-amdgpu has some optimizations for prime which >> > > > > > > > are not >> > > > > > > > yet in xf86-video-modesetting. Which GPU is set up as the >> > > > > > > > primary? >> > > > > > > > Note that the GPU which does the rendering is not necessarily >> > > > > > > > the one >> > > > > > > > that the displays are attached to. The render GPU renders to >> > > > > > > > it's >> > > > > > > > render buffer and then that data may end up being copied other >> > > > > > > > GPUs >> > > > > > > > for display. Also, at this point, all shared buffers have to >> > > > > > > > go >> > > > > > > > through system memory (this will be changing eventually now >> > > > > > > > that we >> > > > > > > > support device memory via dma-buf), so there is often an extra >> > > > > > > > copy >> > > > > > > > involved. >> > > > > > > > >> > > > > > > > > theres another really weird issue... >> > > > > > > > > >> > > > > > > > > given setup 1, where the monitor is plugged in to the card: >> > > > > > > > > when i close the laptop lid, my monitor is "active" and >> > > > > > > > > whatnot, and i >> > > > > > > > > can "use it" in a sense >> > > > > > > > > >> > > > > > > > > however, heres the weirdness: >> > > > > > > > > the mouse cursor will move along the monitor perfectly >> > > > > > > > > smooth and >> > > > > > > > > fine, but all the other updates to the screen are delayed by >> > > > > > > > > about 2 >> > > > > > > > > or 3 seconds. >> > > > > > > > > that is to say, its as if the laptop is doing everything >> > > > > > > > > (e.g. if i >> > > > > > > > > open a terminal, the terminal will open, but it will take 2 >> > > > > > > > > seconds >> > > > > > > > > for me to see it) >> > > > > > > > > >> > > > > > > > > its almost as if all the frames and everything are being >> > > > > > > > > drawn, and >> > > > > > > > > the laptop is running fine and everything, but i simply just >> > > > > > > > > dont get >> > > > > > > > > to see it on the monitor, except for one time every 2 >> > > > > > > > > seconds. >> > > > > > > > > >> > > > > > > > > its hard to articulate, because its so bizarre. its not >> > > > > > > > > like, a "low >> > > > > > > > > fps" per se, because the cursor is totally smooth. but its >> > > > > > > > > that >> > > > > > > > > _everything else_ is only updated once every couple seconds. >> > > > > > > > >> > > > > > > > This might also be related to which GPU is the primary. It >> > > > > > > > still may >> > > > > > > > be the integrated GPU since that is what is attached to the >> > > > > > > > laptop >> > > > > > > > panel. Also the platform and some drivers may do certain >> > > > > > > > things when >> > > > > > > > the lid is closed. E.g., for thermal reasons, the integrated >> > > > > > > > GPU or >> > > > > > > > CPU may have a more limited TDP because the laptop cannot cool >> > > > > > > > as >> > > > > > > > efficiently. >> > > > > > > > >> > > > > > > > Alex _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx