Re: Spontaneous reboots when using RX 560

2019-10-29 Thread Sylvain Munaut
Hi Alex,

> Can you send me a copy of the vbios from that board?

Did you get a chance to look at the bios see if you can find anything
interesting in it ?
(I guess you need some special tools for that, I'm not sure how I'd
find anything in there myself).

After a couple of back and forth with AsRock support they basically
just want me to return the card to get another one which I'm pretty
sure isn't going to accomplish anything except for wasting 1 or 2
weeks shipping stuff around ...

Cheers,

Sylvain
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Spontaneous reboots when using RX 560

2019-10-24 Thread Sylvain Munaut
Hi,

> Can you send me a copy of the vbios from that board?
>
> (as root)
> (use lspci to get the bus id)
> cd /sys/bus/pci/devices/
> echo 1 > rom
> cat rom > /tmp/vbios.rom
> echo 0 > rom

Sure, sent as private message.

Also, I got hold of a RX570 from another vendor and tested that. Works
fine, no crash even during stress tests / benchmarks.

Cheers,

Sylvain
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Spontaneous reboots when using RX 560

2019-10-22 Thread Sylvain Munaut
Hi All,

More testing over the last few days showed that only either the lowest
power mode, or slightly above can work. Oh, I also tested 5.4-rc3 just
in case but same results.
It doesn't seem to be the affected by PCIe lane speed, Memory seems
stable at 625M and almost at 1500M (only the sustained heavy workload
eventually bring it down), but the SoC speed seems pretty touchy.

So that would seem to confirm something is wrong either in the power
play table itself, or its interpretation by the linux driver.
I tried brute-loading some other RX570 pptable into it, but that
didn't really do much. After writing it to pp_table, the card was
stuck at its lower clock mode. Working fine, but same as if I had
forced it to low power.

Is there anyway to extract the power play table from windows since
it's running fine there ?
I'm kind of running out of idea of what to try next.

Cheers,

Sylvain
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Spontaneous reboots when using RX 560

2019-10-19 Thread Sylvain Munaut
Finally some progress !

I found a thread with a couple of people having the same symptoms as I
do ( [1] ), and interestingly that was with the same brand & model of
card.
Although there is no solution, there is a work around that works :

echo -n low  > /sys/class/drm/card0/device/power_dpm_force_performance_level

Then the card seems stable. At least I was able to get through an
entire GL benchmark and also a bunch of CL tests without crashing. (By
default it crashes nearly instantly).
Of course the card is slow but it's better than nothing and maybe
gives a clue to a solution ?

Following some advice on IRC, I also tried setting it to "high". This
doesn't crash immediately when doing that and the display stays fine
and I can move window and light stuff, but trying to actually run GL
or CL stuff and it then crashes.

I also dumped the Power Play tables, see [2]. I can't really
understand them, there is definitely some weird values, but not sure
if that's normal or not.

As I noted earlier in the thread, when I first used the card on
windows, using just AMD's driver the card was stuck at its lowest
clock rate and performed poorly in benchmark. It was only after I
loaded Asrock's own tweak utility that the card started to auto adapt
its clock / voltages.  Not sure if there is a way to dump windows
power play config ?


Cheers,

   Sylvain

[1] 
https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/1112121-rx-560-crash-under-light-load
[2] https://pastebin.com/raw/uWh6WLmh
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Spontaneous reboots when using RX 560

2019-10-19 Thread Sylvain Munaut
Just in case there was any doubt, seems OpenCL workload crashes the
card just as hard.
(That was the AMDGPU-Pro OpenCL lib, legacy version.  Can't get PAL to
detect the card at all)

Cheers,

 Sylvain
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Spontaneous reboots when using RX 560

2019-10-18 Thread Sylvain Munaut
nvidia_drm(POE) amdgpu nvidia_modeset(POE) snd_hda_intel snd_seq_midi
ghash_clmulni_intel nvidia(POE) aesni_intel snd_hda_codec
snd_seq_midi_event snd_hda_core aes_x86_64 snd_rawmidi amd_iommu_v2
crypto_simd gpu_sched cryptd joydev input_leds wmi_bmof snd_hwdep
snd_seq glue_helper ttm snd_pcm ucsi_ccg drm_kms_helper typec_ucsi
snd_seq_device typec drm ccp ipmi_devintf snd_timer ipmi_msghandler
snd fb_sys_fops syscopyarea sysfillrect sysimgblt soundcore mac_hid
sch_fq_codel nct6775 hwmon_vid parport_pc ppdev lp parport ip_tables
x_tables autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic usbhid
hid ixgbe i2c_piix4 igb nvme ahci i2c_nvidia_gpu libahci xfrm_algo
i2c_algo_bit nvme_core dca mdio wmi
[   89.463704] ---[ end trace 455cf9a155c384cb ]---

The "To Be Filled By O.E.M. To Be Filled By O.E.M./" really inspires
confidence ...


Cheers,

Sylvain Munaut
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Spontaneous reboots when using RX 560

2019-10-18 Thread Sylvain Munaut
Hi Christian,


> I would also test if disabling power features helps as well, try to add
> amdgpu.pg_mask=0 and amdgpu.cg_mask=0 to the kernel command line for
> example.

Thanks for the suggestion.
Just tried this, no luck. Also tried 'runpm=0' (but apparently that's
for laptop only so ...)

Even with cg_mask=0, I still see this in amdgpu_pm_info, not sure if
that's expected of if somehow the option was ignored ?


Clock Gating Flags Mask: 0x16b00
Graphics Medium Grain Clock Gating: Off
Graphics Medium Grain memory Light Sleep: Off
Graphics Coarse Grain Clock Gating: Off
Graphics Coarse Grain memory Light Sleep: Off
Graphics Coarse Grain Tree Shader Clock Gating: Off
Graphics Coarse Grain Tree Shader Light Sleep: Off
Graphics Command Processor Light Sleep: Off
Graphics Run List Controller Light Sleep: Off
Graphics 3D Coarse Grain Clock Gating: Off
Graphics 3D Coarse Grain memory Light Sleep: Off
Memory Controller Light Sleep: On
Memory Controller Medium Grain Clock Gating: On
System Direct Memory Access Light Sleep: Off
System Direct Memory Access Medium Grain Clock Gating: On
Bus Interface Medium Grain Clock Gating: Off
Bus Interface Light Sleep: Off
Unified Video Decoder Medium Grain Clock Gating: On
Video Compression Engine Medium Grain Clock Gating: On
Host Data Path Light Sleep: Off
Host Data Path Medium Grain Clock Gating: On
Digital Right Management Medium Grain Clock Gating: Off
Digital Right Management Light Sleep: Off
Rom Medium Grain Clock Gating: Off
Data Fabric Medium Grain Clock Gating: Off
Address Translation Hub Medium Grain Clock Gating: Off
Address Translation Hub Light Sleep: Off

GFX Clocks and Power:
300 MHz (MCLK)
214 MHz (SCLK)
387 MHz (PSTATE_SCLK)
625 MHz (PSTATE_MCLK)
775 mV (VDDGFX)
7.254 W (average GPU)

GPU Temperature: 34 C
GPU Load: 0 %
MEM Load: 6 %

UVD: Disabled

VCE: Disabled


I'm not really sure what to try next. I unfortunately don't have
access to any other card or any other motherboard I could use to test
:/
(Or anything fancy like pcie bus analyzer or stuff like that).

My understanding of the first error message that shows up is that the
card itself tries to make an access to a memory zone it's not allowed
to right ?
[  144.311704] amdgpu :06:00.0: AMD-Vi: Event logged
[IO_PAGE_FAULT domain=0x address=0xa076010100 flags=0x0010]

Cheers,

Sylvain
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Spontaneous reboots when using RX 560

2019-10-17 Thread Sylvain Munaut
So a bit more testing.

I was using a bit of "unusual" config I guess, having 2 GPUs and some
other pcie cards (10G, ..).
So I simplified and went to the most standard thing I could think of,
_just_ the RX 560 card plugged into the main PCIe 16x slot directly
connected to the CPU.

And exact same results, no change in behavior.

So on one hand I'm happy that the other cards and having the AMD GPU
in the second slot isn't the issue (because I really need that config
that way), but on the other, I'm no closer to finding the issue :/

Cheers,

     Sylvain Munaut
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Spontaneous reboots when using RX 560

2019-10-17 Thread Sylvain Munaut
>  From the hardware point of view the only thing which comes to mind is
> that you somehow triggered the ESD protection.
>
> I assume you can rule out an unstable physical connection (because it
> works on windows), so the only thing left is that there is something
> very very badly going wrong with power management.
>
> Have you "tuned" the power tables on the board somehow?

Nope, not at all.

In windows, I actually had noticed that before I had installed the
Asrock utility for the card, it was staying at its lowest clock.
I had the Radeon / AMD drivers installed of course, but not the vendor
tools for the board. Once I installed that, it started automatically
going to higher power state as the load varied. And it's set to the
"default" profile.

On linux I haven't done anything. Just a fresh Ubuntu 19.10 install
with amdgpu loaded. Not sure if I have anything else to do. I'm not
even sure how to monitor the card frequency / voltage on linux.


> Or maybe multiple GPUs connected to the same power supply?

That machine has another GPU, a NVidia one in the first x16 slot. The
Nvidia GPU has a PCIe power connector going to it.
The RX 560 board (
https://www.asrock.com/Graphics-Card/AMD/Phantom%20Gaming%20Radeon%20RX560%202G/
) doesn't have any additional PCIe power input, so it gets all its
power from the PCIe slot itself.

The PC has a 650W good quality Corsair power supply, and during all
theses tests the NVidia GPU was idle (not even a xserver launched on
it or nothing), and the fan PSU didn't even spin up (it doesn't spin
if power is < 350 W), so I think it has plenty of margin.


Cheers,

Sylvain
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Spontaneous reboots when using RX 560

2019-10-17 Thread Sylvain Munaut
Hi,


> > I have RX 560 2G card. It's plugged into a 16x physical / 4x
> > electrical slot of a X570 chipset motherboard with a Ryzen 3700X CPU.
> > The hardware works fine and is stable under Windows (tested with
> > games, benchmarks, stress-tests, ...)
>
> Does booting with pci=noats on the kernel command line in grub fix the issue?

It doesn't :/

Message is slightly different but same idea :

[   83.704035] amdgpu :06:00.0: AMD-Vi: Event logged
[IO_PAGE_FAULT domain=0x address=0x0 flags=0x0020]
[   88.732685] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]]
*ERROR* Waiting for fences timed out or interrupted!
[   92.074379] ixgbe :04:00.1: Adapter removed
[   93.480989] igb :07:00.0 enp7s0: PCIe link lost

So it screws up the PCIe very badly :/
Specifically seems to be everything connected to the X570 chipset.

Cheers,

Sylvain
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Spontaneous reboots when using RX 560

2019-10-17 Thread Sylvain Munaut
HI,

I have RX 560 2G card. It's plugged into a 16x physical / 4x
electrical slot of a X570 chipset motherboard with a Ryzen 3700X CPU.
The hardware works fine and is stable under Windows (tested with
games, benchmarks, stress-tests, ...)

But when trying for instance steam under linux, or even just the 'app
launcher' from ubuntu that has some visual effect, the machine will
instantly reboot.
Also, after the reboot, the GPU is no longer detected (lspci doesn't
show it, and under windows, its no where to be seen either). It needs
to be physically turned off and turned back on for it to work again.

I added a serial console to try to get some output and when doing that
it doesn't immediately reboot (but the rest is the same, machine is
unusable and a reboot will have the GPU not present anymore until
poweroff).

This is the output I get :

[  144.311704] amdgpu :06:00.0: AMD-Vi: Event logged
[IO_PAGE_FAULT domain=0x address=0xa076010100 flags=0x0010]
[  144.322734] amdgpu :06:00.0: AMD-Vi: Event logged
[IO_PAGE_FAULT domain=0x address=0xa076230100 flags=0x0010]
[  144.333751] amdgpu :06:00.0: AMD-Vi: Event logged
[IO_PAGE_FAULT domain=0x address=0xa076030100 flags=0x0010]
[  147.028625] AMD-Vi: Completion-Wait loop timed out
[  147.206336] AMD-Vi: Completion-Wait loop timed out
[  147.368260] AMD-Vi: Completion-Wait loop timed out
[  147.532296] AMD-Vi: Completion-Wait loop timed out
[  147.703269] AMD-Vi: Completion-Wait loop timed out
[  147.845840] AMD-Vi: Completion-Wait loop timed out
[  147.860950] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT
device=06:00.0 address=0x81b1c1e60]
[  148.015778] AMD-Vi: Completion-Wait loop timed out
[  148.187270] AMD-Vi: Completion-Wait loop timed out

(and then it seem to infinitely loop always printing that).

I tried Ubuntu 19.10 with 5.3.0-18-generic
Also Ubuntu 19.04 with 5.0.0-31-generic
Also tried with a DKMS module from 19.30 AMDGPU-PRO patched to build
and load under 5.3.0, all give the same result.

Cheers,

Sylvain Munaut
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx