Re: Spontaneous reboots when using RX 560
Hi Alex, > Can you send me a copy of the vbios from that board? Did you get a chance to look at the bios see if you can find anything interesting in it ? (I guess you need some special tools for that, I'm not sure how I'd find anything in there myself). After a couple of back and forth with AsRock support they basically just want me to return the card to get another one which I'm pretty sure isn't going to accomplish anything except for wasting 1 or 2 weeks shipping stuff around ... Cheers, Sylvain ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Spontaneous reboots when using RX 560
Hi, > Can you send me a copy of the vbios from that board? > > (as root) > (use lspci to get the bus id) > cd /sys/bus/pci/devices/ > echo 1 > rom > cat rom > /tmp/vbios.rom > echo 0 > rom Sure, sent as private message. Also, I got hold of a RX570 from another vendor and tested that. Works fine, no crash even during stress tests / benchmarks. Cheers, Sylvain ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Spontaneous reboots when using RX 560
Hi All, More testing over the last few days showed that only either the lowest power mode, or slightly above can work. Oh, I also tested 5.4-rc3 just in case but same results. It doesn't seem to be the affected by PCIe lane speed, Memory seems stable at 625M and almost at 1500M (only the sustained heavy workload eventually bring it down), but the SoC speed seems pretty touchy. So that would seem to confirm something is wrong either in the power play table itself, or its interpretation by the linux driver. I tried brute-loading some other RX570 pptable into it, but that didn't really do much. After writing it to pp_table, the card was stuck at its lower clock mode. Working fine, but same as if I had forced it to low power. Is there anyway to extract the power play table from windows since it's running fine there ? I'm kind of running out of idea of what to try next. Cheers, Sylvain ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Spontaneous reboots when using RX 560
Finally some progress ! I found a thread with a couple of people having the same symptoms as I do ( [1] ), and interestingly that was with the same brand & model of card. Although there is no solution, there is a work around that works : echo -n low > /sys/class/drm/card0/device/power_dpm_force_performance_level Then the card seems stable. At least I was able to get through an entire GL benchmark and also a bunch of CL tests without crashing. (By default it crashes nearly instantly). Of course the card is slow but it's better than nothing and maybe gives a clue to a solution ? Following some advice on IRC, I also tried setting it to "high". This doesn't crash immediately when doing that and the display stays fine and I can move window and light stuff, but trying to actually run GL or CL stuff and it then crashes. I also dumped the Power Play tables, see [2]. I can't really understand them, there is definitely some weird values, but not sure if that's normal or not. As I noted earlier in the thread, when I first used the card on windows, using just AMD's driver the card was stuck at its lowest clock rate and performed poorly in benchmark. It was only after I loaded Asrock's own tweak utility that the card started to auto adapt its clock / voltages. Not sure if there is a way to dump windows power play config ? Cheers, Sylvain [1] https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/1112121-rx-560-crash-under-light-load [2] https://pastebin.com/raw/uWh6WLmh ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Spontaneous reboots when using RX 560
Just in case there was any doubt, seems OpenCL workload crashes the card just as hard. (That was the AMDGPU-Pro OpenCL lib, legacy version. Can't get PAL to detect the card at all) Cheers, Sylvain ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Spontaneous reboots when using RX 560
nvidia_drm(POE) amdgpu nvidia_modeset(POE) snd_hda_intel snd_seq_midi ghash_clmulni_intel nvidia(POE) aesni_intel snd_hda_codec snd_seq_midi_event snd_hda_core aes_x86_64 snd_rawmidi amd_iommu_v2 crypto_simd gpu_sched cryptd joydev input_leds wmi_bmof snd_hwdep snd_seq glue_helper ttm snd_pcm ucsi_ccg drm_kms_helper typec_ucsi snd_seq_device typec drm ccp ipmi_devintf snd_timer ipmi_msghandler snd fb_sys_fops syscopyarea sysfillrect sysimgblt soundcore mac_hid sch_fq_codel nct6775 hwmon_vid parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid ixgbe i2c_piix4 igb nvme ahci i2c_nvidia_gpu libahci xfrm_algo i2c_algo_bit nvme_core dca mdio wmi [ 89.463704] ---[ end trace 455cf9a155c384cb ]--- The "To Be Filled By O.E.M. To Be Filled By O.E.M./" really inspires confidence ... Cheers, Sylvain Munaut ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Spontaneous reboots when using RX 560
Hi Christian, > I would also test if disabling power features helps as well, try to add > amdgpu.pg_mask=0 and amdgpu.cg_mask=0 to the kernel command line for > example. Thanks for the suggestion. Just tried this, no luck. Also tried 'runpm=0' (but apparently that's for laptop only so ...) Even with cg_mask=0, I still see this in amdgpu_pm_info, not sure if that's expected of if somehow the option was ignored ? Clock Gating Flags Mask: 0x16b00 Graphics Medium Grain Clock Gating: Off Graphics Medium Grain memory Light Sleep: Off Graphics Coarse Grain Clock Gating: Off Graphics Coarse Grain memory Light Sleep: Off Graphics Coarse Grain Tree Shader Clock Gating: Off Graphics Coarse Grain Tree Shader Light Sleep: Off Graphics Command Processor Light Sleep: Off Graphics Run List Controller Light Sleep: Off Graphics 3D Coarse Grain Clock Gating: Off Graphics 3D Coarse Grain memory Light Sleep: Off Memory Controller Light Sleep: On Memory Controller Medium Grain Clock Gating: On System Direct Memory Access Light Sleep: Off System Direct Memory Access Medium Grain Clock Gating: On Bus Interface Medium Grain Clock Gating: Off Bus Interface Light Sleep: Off Unified Video Decoder Medium Grain Clock Gating: On Video Compression Engine Medium Grain Clock Gating: On Host Data Path Light Sleep: Off Host Data Path Medium Grain Clock Gating: On Digital Right Management Medium Grain Clock Gating: Off Digital Right Management Light Sleep: Off Rom Medium Grain Clock Gating: Off Data Fabric Medium Grain Clock Gating: Off Address Translation Hub Medium Grain Clock Gating: Off Address Translation Hub Light Sleep: Off GFX Clocks and Power: 300 MHz (MCLK) 214 MHz (SCLK) 387 MHz (PSTATE_SCLK) 625 MHz (PSTATE_MCLK) 775 mV (VDDGFX) 7.254 W (average GPU) GPU Temperature: 34 C GPU Load: 0 % MEM Load: 6 % UVD: Disabled VCE: Disabled I'm not really sure what to try next. I unfortunately don't have access to any other card or any other motherboard I could use to test :/ (Or anything fancy like pcie bus analyzer or stuff like that). My understanding of the first error message that shows up is that the card itself tries to make an access to a memory zone it's not allowed to right ? [ 144.311704] amdgpu :06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xa076010100 flags=0x0010] Cheers, Sylvain ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Spontaneous reboots when using RX 560
So a bit more testing. I was using a bit of "unusual" config I guess, having 2 GPUs and some other pcie cards (10G, ..). So I simplified and went to the most standard thing I could think of, _just_ the RX 560 card plugged into the main PCIe 16x slot directly connected to the CPU. And exact same results, no change in behavior. So on one hand I'm happy that the other cards and having the AMD GPU in the second slot isn't the issue (because I really need that config that way), but on the other, I'm no closer to finding the issue :/ Cheers, Sylvain Munaut ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Spontaneous reboots when using RX 560
> From the hardware point of view the only thing which comes to mind is > that you somehow triggered the ESD protection. > > I assume you can rule out an unstable physical connection (because it > works on windows), so the only thing left is that there is something > very very badly going wrong with power management. > > Have you "tuned" the power tables on the board somehow? Nope, not at all. In windows, I actually had noticed that before I had installed the Asrock utility for the card, it was staying at its lowest clock. I had the Radeon / AMD drivers installed of course, but not the vendor tools for the board. Once I installed that, it started automatically going to higher power state as the load varied. And it's set to the "default" profile. On linux I haven't done anything. Just a fresh Ubuntu 19.10 install with amdgpu loaded. Not sure if I have anything else to do. I'm not even sure how to monitor the card frequency / voltage on linux. > Or maybe multiple GPUs connected to the same power supply? That machine has another GPU, a NVidia one in the first x16 slot. The Nvidia GPU has a PCIe power connector going to it. The RX 560 board ( https://www.asrock.com/Graphics-Card/AMD/Phantom%20Gaming%20Radeon%20RX560%202G/ ) doesn't have any additional PCIe power input, so it gets all its power from the PCIe slot itself. The PC has a 650W good quality Corsair power supply, and during all theses tests the NVidia GPU was idle (not even a xserver launched on it or nothing), and the fan PSU didn't even spin up (it doesn't spin if power is < 350 W), so I think it has plenty of margin. Cheers, Sylvain ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Spontaneous reboots when using RX 560
Hi, > > I have RX 560 2G card. It's plugged into a 16x physical / 4x > > electrical slot of a X570 chipset motherboard with a Ryzen 3700X CPU. > > The hardware works fine and is stable under Windows (tested with > > games, benchmarks, stress-tests, ...) > > Does booting with pci=noats on the kernel command line in grub fix the issue? It doesn't :/ Message is slightly different but same idea : [ 83.704035] amdgpu :06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0x0 flags=0x0020] [ 88.732685] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out or interrupted! [ 92.074379] ixgbe :04:00.1: Adapter removed [ 93.480989] igb :07:00.0 enp7s0: PCIe link lost So it screws up the PCIe very badly :/ Specifically seems to be everything connected to the X570 chipset. Cheers, Sylvain ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Spontaneous reboots when using RX 560
HI, I have RX 560 2G card. It's plugged into a 16x physical / 4x electrical slot of a X570 chipset motherboard with a Ryzen 3700X CPU. The hardware works fine and is stable under Windows (tested with games, benchmarks, stress-tests, ...) But when trying for instance steam under linux, or even just the 'app launcher' from ubuntu that has some visual effect, the machine will instantly reboot. Also, after the reboot, the GPU is no longer detected (lspci doesn't show it, and under windows, its no where to be seen either). It needs to be physically turned off and turned back on for it to work again. I added a serial console to try to get some output and when doing that it doesn't immediately reboot (but the rest is the same, machine is unusable and a reboot will have the GPU not present anymore until poweroff). This is the output I get : [ 144.311704] amdgpu :06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xa076010100 flags=0x0010] [ 144.322734] amdgpu :06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xa076230100 flags=0x0010] [ 144.333751] amdgpu :06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xa076030100 flags=0x0010] [ 147.028625] AMD-Vi: Completion-Wait loop timed out [ 147.206336] AMD-Vi: Completion-Wait loop timed out [ 147.368260] AMD-Vi: Completion-Wait loop timed out [ 147.532296] AMD-Vi: Completion-Wait loop timed out [ 147.703269] AMD-Vi: Completion-Wait loop timed out [ 147.845840] AMD-Vi: Completion-Wait loop timed out [ 147.860950] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x81b1c1e60] [ 148.015778] AMD-Vi: Completion-Wait loop timed out [ 148.187270] AMD-Vi: Completion-Wait loop timed out (and then it seem to infinitely loop always printing that). I tried Ubuntu 19.10 with 5.3.0-18-generic Also Ubuntu 19.04 with 5.0.0-31-generic Also tried with a DKMS module from 19.30 AMDGPU-PRO patched to build and load under 5.3.0, all give the same result. Cheers, Sylvain Munaut ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx