BUG: [amdgpu]] *ERROR* ring gfx timeout

2018-09-17 Thread Daniel Andersson
Hey,

GPU hangs for a while(~20s) and dmesg is:

110.343379] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, last signaled seq=7273, last emitted seq=7275
[  110.343385] [drm] GPU recovery disabled.

4.18.8-arch1-1-ARCH & Vega10:

Reproduce by:
git clone git://github.com/bkaradzic/bx.git
git clone git://github.com/bkaradzic/bimg.git
git clone git://github.com/bkaradzic/bgfx.git

cd bgfx
make -jX linux-release64
cd examples/runtime
../../.build/linux64_gcc/bin/examplesRelease
Switch to example "37-gpudrivenrendering" from the drop down menu

/ Daniel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: BUG: *ERROR* No EDID read

2018-09-12 Thread Daniel Andersson
Hey,

I had some time to bisect again tonight and it seems
018d82e5f02ef3583411bcaa4e00c69786f46f19
got back in again through:
# first bad commit: [d98c71dadc2d0debdb80beb5a478baf1e6f98758] Merge
drm-upstream/drm-next into drm-misc-next

/ Daniel
On Wed, 29 Aug 2018 at 16:02, Daniel Andersson  wrote:
>
> Hey again,
>
> This is an issue for me yet again on 4.19-rc1:
>
> [5.743354] [drm:dc_link_detect [amdgpu]] *ERROR* No EDID read.
>
> / Daniel
>
> dmesg with drm.debug=0x4:
>
> [0.00] Linux version 4.19.0-rc1-ARCH (engy@sleipnir) (gcc
> version 8.2.0 (GCC)) #1 SMP PREEMPT Tue Aug 28 10:36:50 CEST 2018
> [0.00] Command line: BOOT_IMAGE=/vmlinuz-linuxtest
> root=UUID=27247597-a354-42f3-8040-caff9592a297 rw quiet drm.debug=0x4
> [0.00] KERNEL supported cpus:
> [0.00]   AMD AuthenticAMD
> [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating
> point registers'
> [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
> [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
> [0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
> [0.00] x86/fpu: Enabled xstate features 0x7, context size is
> 832 bytes, using 'compacted' format.
> [0.00] BIOS-provided physical RAM map:
> [0.00] BIOS-e820: [mem 0x-0x0009] usable
> [0.00] BIOS-e820: [mem 0x000a-0x000f] reserved
> [0.00] BIOS-e820: [mem 0x0010-0x09de] usable
> [0.00] BIOS-e820: [mem 0x09df-0x09ff] reserved
> [0.00] BIOS-e820: [mem 0x0a00-0xb75f] usable
> [0.00] BIOS-e820: [mem 0xb760-0xb8b4afff] reserved
> [0.00] BIOS-e820: [mem 0xb8b4b000-0xb8f2cfff] usable
> [0.00] BIOS-e820: [mem 0xb8f2d000-0xb902dfff] ACPI NVS
> [0.00] BIOS-e820: [mem 0xb902e000-0xb9cf6fff] reserved
> [0.00] BIOS-e820: [mem 0xb9cf7000-0xb9e16fff] type 20
> [0.00] BIOS-e820: [mem 0xb9e17000-0xbbff] usable
> [0.00] BIOS-e820: [mem 0xbc00-0xbfff] reserved
> [0.00] BIOS-e820: [mem 0xed00-0xed07] reserved
> [0.00] BIOS-e820: [mem 0xed0f-0xed0f0fff] reserved
> [0.00] BIOS-e820: [mem 0xef60-0xef67] reserved
> [0.00] BIOS-e820: [mem 0xef6f-0xef6f0fff] reserved
> [0.00] BIOS-e820: [mem 0xfea0-0xfec00fff] reserved
> [0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
> [0.00] BIOS-e820: [mem 0xfec3-0xfec30fff] reserved
> [0.00] BIOS-e820: [mem 0xfed0-0xfed00fff] reserved
> [0.00] BIOS-e820: [mem 0xfed4-0xfed44fff] reserved
> [0.00] BIOS-e820: [mem 0xfed8-0xfed8] reserved
> [0.00] BIOS-e820: [mem 0xfedc-0xfedc0fff] reserved
> [0.00] BIOS-e820: [mem 0xfedc2000-0xfedc] reserved
> [0.00] BIOS-e820: [mem 0xfedd4000-0xfedd5fff] reserved
> [0.00] BIOS-e820: [mem 0xfee0-0xfeef] reserved
> [0.00] BIOS-e820: [mem 0xff00-0x] reserved
> [0.00] BIOS-e820: [mem 0x0001-0x00083f2f] usable
> [0.00] BIOS-e820: [mem 0x00083f30-0x00083fff] reserved
> [0.00] NX (Execute Disable) protection: active
> [0.00] efi: EFI v2.60 by American Megatrends
> [0.00] efi:  ACPI 2.0=0xb8f2d000  ACPI=0xb8f2d000
> SMBIOS=0xb9c66000  SMBIOS 3.0=0xb9c65000  ESRT=0xb5ebef98
> [0.00] SMBIOS 3.0.0 present.
> [0.00] DMI: To Be Filled By O.E.M. To Be Filled By O.E.M./X399
> Taichi, BIOS P2.00 11/21/2017
> [0.00] tsc: Fast TSC calibration failed
> [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
> [0.00] e820: remove [mem 0x000a-0x000f] usable
> [0.00] last_pfn = 0x83f300 max_arch_pfn = 0x4
> [0.00] MTRR default type: uncachable
> [0.00] MTRR fixed ranges enabled:
> [0.00]   0-9 write-back
> [0.00]   A-B write-through
> [0.00]   C-D uncachable
> [0.00]   E-F write-protect
> [0.00] MTRR variable ranges enabled:
> [0.00]   0 base  mask 8000 write-back
> [0.00]   1 base 8000 mask C000 write-back
> [0.00]   2 base BC00 m

Re: BUG: *ERROR* No EDID read

2018-07-09 Thread Daniel Andersson
Alright, I redid the bisect and this time ended up on
018d82e5f02ef3583411bcaa4e00c69786f46f19 . If I revert it, it fixes my
issue. Reverting it on 4.18-rc2 also works.

// Daniel

On 5 July 2018 at 21:47, Daniel Andersson  wrote:
> [0.00] Command line: BOOT_IMAGE=/vmlinuz-linuxtest
> root=UUID=27247597-a354-42f3-8040-caff9592a297 drm.debug=0x4 rw quiet
> [0.00] Kernel command line: BOOT_IMAGE=/vmlinuz-linuxtest
> root=UUID=27247597-a354-42f3-8040-caff9592a297 drm.debug=0x4 rw quiet
> [5.674793] [drm:bios_parser_get_firmware_info [amdgpu]] At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.674833] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.674887] [drm:bios_parser_get_firmware_info [amdgpu]] At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.674930] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.674974] [drm:bios_parser_get_firmware_info [amdgpu]] At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.675014] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.675056] [drm:bios_parser_get_firmware_info [amdgpu]] At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.675095] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.675138] [drm:bios_parser_get_firmware_info [amdgpu]] At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.675178] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.675221] [drm:bios_parser_get_firmware_info [amdgpu]] At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.675260] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.675304] [drm:bios_parser_get_firmware_info [amdgpu]] At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.675342] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.675384] [drm:bios_parser_get_firmware_info [amdgpu]] At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.675422] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.675465] [drm:bios_parser_get_firmware_info [amdgpu]] At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.675503] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.675587] [drm:bios_parser_get_firmware_info [amdgpu]] At
> bios_parser_get_firmware_info switch, got major 3 minor 1
> [5.675626] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
> bios_parser_get_firmware_info switch, got major 3 minor 1
>
> I don't really know what is going on. If I go back to 4.17 and apply
> my "fix". It doesn't work. I suppose my bisect didn't get me the right
> commit. I probably never tested that the commit, from the bisect, was
> actually bad. I was also a little lazy and did "bisect start
> 29dcea88779c856c7dc92040a0c01233263101d4
> 6da6c0db5316275015e8cc2959f12a17584aeb64 -- drivers/gpu/drm/amd".
>
> I guess I'll try another bisect tomorrow on the entire tree, sorry for
> the extra work.
>
> // Daniel
>
> On 5 July 2018 at 20:22, Harry Wentland  wrote:
>> On 2018-07-05 01:43 PM, Daniel Andersson wrote:
>>> Well, this workaround:
>>>
>>> diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
>>> b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
>>> index 10a5807a7e8b..d0f5910c906c 100644
>>> --- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
>>> +++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
>>> @@ -1321,6 +1321,8 @@ static enum bp_result bios_parser_get_firmware_info(
>>>   header = GET_IMAGE(struct atom_common_table_header,
>>>   DATA_TABLES(firmwareinfo));
>>>   get_atom_data_table_revision(header, );
>>> +dm_output_to_console("At bios_parser_get_firmware_info switch,
>>> got major %d minor %d", revision.major, revision.minor);
>>> +dm_error("At bios_parser_get_firmware_info switch, got major %d
>>> minor %d", revision.major, revision.minor);
>>>   switch (revision.major) {
>>>   case 3:
>>>   switch (revision.minor) {
>>> @@ -1328,7 +1330,7 @@ static enum bp_result b

Re: BUG: *ERROR* No EDID read

2018-07-05 Thread Daniel Andersson
[0.00] Command line: BOOT_IMAGE=/vmlinuz-linuxtest
root=UUID=27247597-a354-42f3-8040-caff9592a297 drm.debug=0x4 rw quiet
[0.00] Kernel command line: BOOT_IMAGE=/vmlinuz-linuxtest
root=UUID=27247597-a354-42f3-8040-caff9592a297 drm.debug=0x4 rw quiet
[5.674793] [drm:bios_parser_get_firmware_info [amdgpu]] At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.674833] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.674887] [drm:bios_parser_get_firmware_info [amdgpu]] At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.674930] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.674974] [drm:bios_parser_get_firmware_info [amdgpu]] At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.675014] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.675056] [drm:bios_parser_get_firmware_info [amdgpu]] At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.675095] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.675138] [drm:bios_parser_get_firmware_info [amdgpu]] At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.675178] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.675221] [drm:bios_parser_get_firmware_info [amdgpu]] At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.675260] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.675304] [drm:bios_parser_get_firmware_info [amdgpu]] At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.675342] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.675384] [drm:bios_parser_get_firmware_info [amdgpu]] At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.675422] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.675465] [drm:bios_parser_get_firmware_info [amdgpu]] At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.675503] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.675587] [drm:bios_parser_get_firmware_info [amdgpu]] At
bios_parser_get_firmware_info switch, got major 3 minor 1
[5.675626] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At
bios_parser_get_firmware_info switch, got major 3 minor 1

I don't really know what is going on. If I go back to 4.17 and apply
my "fix". It doesn't work. I suppose my bisect didn't get me the right
commit. I probably never tested that the commit, from the bisect, was
actually bad. I was also a little lazy and did "bisect start
29dcea88779c856c7dc92040a0c01233263101d4
6da6c0db5316275015e8cc2959f12a17584aeb64 -- drivers/gpu/drm/amd".

I guess I'll try another bisect tomorrow on the entire tree, sorry for
the extra work.

// Daniel

On 5 July 2018 at 20:22, Harry Wentland  wrote:
> On 2018-07-05 01:43 PM, Daniel Andersson wrote:
>> Well, this workaround:
>>
>> diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
>> b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
>> index 10a5807a7e8b..d0f5910c906c 100644
>> --- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
>> +++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
>> @@ -1321,6 +1321,8 @@ static enum bp_result bios_parser_get_firmware_info(
>>   header = GET_IMAGE(struct atom_common_table_header,
>>   DATA_TABLES(firmwareinfo));
>>   get_atom_data_table_revision(header, );
>> +dm_output_to_console("At bios_parser_get_firmware_info switch,
>> got major %d minor %d", revision.major, revision.minor);
>> +dm_error("At bios_parser_get_firmware_info switch, got major %d
>> minor %d", revision.major, revision.minor);
>>   switch (revision.major) {
>>   case 3:
>>   switch (revision.minor) {
>> @@ -1328,7 +1330,7 @@ static enum bp_result bios_parser_get_firmware_info(
>>   result = get_firmware_info_v3_1(bp, info);
>>   break;
>>   case 2:
>> - result = get_firmware_info_v3_2(bp, info);
>> + result = get_firmware_info_v3_1(bp, info);
>>   break;
>>   default:
>>   break;
>>
>> "works":
>> [engy][~/devel/3pp/linux] ((6e65fb862064...)|BISECTING)$ xrandr
>> Screen 0: minimum 320 x 200, current 2560 x 1440, maximum 16384 x 16384
>> DisplayPort-0 connected 2560x1440+0+0 (normal left inverted right x
>> axis y axis) 598mm x 336mm
>>

Re: BUG: *ERROR* No EDID read

2018-07-05 Thread Daniel Andersson
Well, this workaround:

diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
index 10a5807a7e8b..d0f5910c906c 100644
--- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
+++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
@@ -1321,6 +1321,8 @@ static enum bp_result bios_parser_get_firmware_info(
  header = GET_IMAGE(struct atom_common_table_header,
  DATA_TABLES(firmwareinfo));
  get_atom_data_table_revision(header, );
+dm_output_to_console("At bios_parser_get_firmware_info switch,
got major %d minor %d", revision.major, revision.minor);
+dm_error("At bios_parser_get_firmware_info switch, got major %d
minor %d", revision.major, revision.minor);
  switch (revision.major) {
  case 3:
  switch (revision.minor) {
@@ -1328,7 +1330,7 @@ static enum bp_result bios_parser_get_firmware_info(
  result = get_firmware_info_v3_1(bp, info);
  break;
  case 2:
- result = get_firmware_info_v3_2(bp, info);
+ result = get_firmware_info_v3_1(bp, info);
  break;
  default:
  break;

"works":
[engy][~/devel/3pp/linux] ((6e65fb862064...)|BISECTING)$ xrandr
Screen 0: minimum 320 x 200, current 2560 x 1440, maximum 16384 x 16384
DisplayPort-0 connected 2560x1440+0+0 (normal left inverted right x
axis y axis) 598mm x 336mm
   2560x1440 59.95*+ 120.0099.9584.9823.97
   1920x1200 59.95
   1920x1080 59.95
   1600x1200 59.95
   1680x1050 59.95
   1280x1024 59.95
   1440x900  59.95
   1280x800  59.95
   1280x720  59.95
   1024x768  59.95
   800x600   59.95
   640x480   59.95
DisplayPort-1 disconnected (normal left inverted right x axis y axis)
HDMI-A-0 disconnected (normal left inverted right x axis y axis)
HDMI-A-1 disconnected (normal left inverted right x axis y axis)

Where does dm_error and dm_output_to_console end up?

// Daniel

On 5 July 2018 at 18:42, Deucher, Alexander  wrote:
> So your vbios has table v3.1 so it should not be affected by that patch.
> Does reverting that patch actually fix the issue?
>
>
> Alex
>
> ________
> From: amd-gfx  on behalf of Daniel
> Andersson 
> Sent: Thursday, July 5, 2018 12:22:17 PM
> To: Alex Deucher
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: BUG: *ERROR* No EDID read
>
> I have not flashed any GPU BIOS. It's not a reference Vega though,
> Sapphire something. Maybe they made changes?
>
> vbios is attached.
>
> // Daniel
>
> On 5 July 2018 at 15:38, Alex Deucher  wrote:
>> On Mon, Jul 2, 2018 at 5:39 PM, Daniel Andersson 
>> wrote:
>>> Sure, bisecting gets me 6e65fb862064663ad3a08f964af1e8f3f2abf688 .
>>>
>>> In drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c,
>>> get_firmware_info_v3_1() works but get_firmware_info_v3_2() does not
>>> do the right thing for my Vega.
>>>
>>> Could I break my GPU if I were to set some bad/wrong frequency there?
>>
>> vega10 should not hit that new path at all.  Have you edited your
>> vbios?  Can you send us a copy?  To get a copy of the vbios:
>>
>> Without the driver loaded:
>> (as root)
>> (use lspci to get the bus id)
>> cd /sys/bus/pci/devices/
>> echo 1 > rom
>> cat rom > /tmp/vbios.rom
>> echo 0 > rom
>>
>> If the driver is loaded:
>> (as root)
>> cat /sys/kernel/debug/dri/0/amdgpu_vbios > /tmp/vbios.rom
>>
>> Alex
>>
>>>
>>> lspci:
>>> 43:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
>>> [AMD/ATI] Vega 10 XT [Radeon RX Vega 64] (rev c1) (prog-if 00 [VGA
>>> controller])
>>>   Subsystem: Sapphire Technology Limited Vega 10 XT [Radeon RX Vega
>>> 64]
>>>   Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>>> Stepping- SERR- FastB2B- DisINTx+
>>>   Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>> SERR- >>   Latency: 0, Cache Line Size: 64 bytes
>>>   Interrupt: pin A routed to IRQ 85
>>>   Region 0: Memory at d000 (64-bit, prefetchable) [size=256M]
>>>   Region 2: Memory at e000 (64-bit, prefetchable) [size=2M]
>>>   Region 4: I/O ports at f000 [size=256]
>>>   Region 5: Memory at ed40 (32-bit, non-prefetchable) [size=512K]
>>>   Expansion ROM at ed48 [disabled] [size=128K]
>>>   Capabilities: [48] Vendor Specific Information: Len=08 
>>>   Capabilities: [50] Power Management version 3
>>> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
>>> PME(D0-,D1+,D2+,D3hot+,D3cold+)
>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>   Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00
>>>   

Re: BUG: *ERROR* No EDID read

2018-07-05 Thread Daniel Andersson
I have not flashed any GPU BIOS. It's not a reference Vega though,
Sapphire something. Maybe they made changes?

vbios is attached.

// Daniel

On 5 July 2018 at 15:38, Alex Deucher  wrote:
> On Mon, Jul 2, 2018 at 5:39 PM, Daniel Andersson  wrote:
>> Sure, bisecting gets me 6e65fb862064663ad3a08f964af1e8f3f2abf688 .
>>
>> In drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c,
>> get_firmware_info_v3_1() works but get_firmware_info_v3_2() does not
>> do the right thing for my Vega.
>>
>> Could I break my GPU if I were to set some bad/wrong frequency there?
>
> vega10 should not hit that new path at all.  Have you edited your
> vbios?  Can you send us a copy?  To get a copy of the vbios:
>
> Without the driver loaded:
> (as root)
> (use lspci to get the bus id)
> cd /sys/bus/pci/devices/
> echo 1 > rom
> cat rom > /tmp/vbios.rom
> echo 0 > rom
>
> If the driver is loaded:
> (as root)
> cat /sys/kernel/debug/dri/0/amdgpu_vbios > /tmp/vbios.rom
>
> Alex
>
>>
>> lspci:
>> 43:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
>> [AMD/ATI] Vega 10 XT [Radeon RX Vega 64] (rev c1) (prog-if 00 [VGA
>> controller])
>>   Subsystem: Sapphire Technology Limited Vega 10 XT [Radeon RX Vega
>> 64]
>>   Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>> Stepping- SERR- FastB2B- DisINTx+
>>   Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>> SERR- >   Latency: 0, Cache Line Size: 64 bytes
>>   Interrupt: pin A routed to IRQ 85
>>   Region 0: Memory at d000 (64-bit, prefetchable) [size=256M]
>>   Region 2: Memory at e000 (64-bit, prefetchable) [size=2M]
>>   Region 4: I/O ports at f000 [size=256]
>>   Region 5: Memory at ed40 (32-bit, non-prefetchable) [size=512K]
>>   Expansion ROM at ed48 [disabled] [size=128K]
>>   Capabilities: [48] Vendor Specific Information: Len=08 
>>   Capabilities: [50] Power Management version 3
>> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
>> PME(D0-,D1+,D2+,D3hot+,D3cold+)
>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>   Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00
>> DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1
>> unlimited
>>   ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>>   RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
>>   MaxPayload 256 bytes, MaxReadReq 512 bytes
>> DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr-
>> TransPend-
>> LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency
>> L0s <64ns, L1 <1us
>>   ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
>> LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>   ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>> LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive-
>> BWMgmt- ABWMgmt-
>> DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+,
>> OBFF Not Supported
>>AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF
>> Disabled
>>AtomicOpsCtl: ReqEn-
>> LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
>>Transmit Margin: Normal Operating Range,
>> EnterModifiedCompliance- ComplianceSOS-
>>Compliance De-emphasis: -6dB
>> LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+,
>> EqualizationPhase1+
>>EqualizationPhase2+, EqualizationPhase3+,
>> LinkEqualizationRequest-
>>   Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>> Address: fee0  Data: 
>>   Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1
>> Len=010 
>>   Capabilities: [150 v2] Advanced Error Reporting
>> UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
>> MalfTLP- ECRC- UnsupReq- ACSViol-
>> UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
>> MalfTLP- ECRC- UnsupReq- ACSViol-
>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
>> MalfTLP+ ECRC- UnsupReq- ACSViol-
>> CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>> CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>> ECRCChkCap+ ECRCChkEn-
>>   MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>> HeaderLog:    
>>   Capa

Re: BUG: *ERROR* No EDID read

2018-07-03 Thread Daniel Andersson
Sure, bisecting gets me 6e65fb862064663ad3a08f964af1e8f3f2abf688 .

In drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c,
get_firmware_info_v3_1() works but get_firmware_info_v3_2() does not
do the right thing for my Vega.

Could I break my GPU if I were to set some bad/wrong frequency there?

lspci:
43:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Vega 10 XT [Radeon RX Vega 64] (rev c1) (prog-if 00 [VGA
controller])
  Subsystem: Sapphire Technology Limited Vega 10 XT [Radeon RX Vega
64]
  Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR- 
  Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
  Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1
unlimited
  ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
  RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
  MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr-
TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency
L0s <64ns, L1 <1us
  ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
  ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive-
BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+,
OBFF Not Supported
   AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF
Disabled
   AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
   Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
   Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+,
EqualizationPhase1+
   EqualizationPhase2+, EqualizationPhase3+,
LinkEqualizationRequest-
  Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: fee0  Data: 
  Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1
Len=010 
  Capabilities: [150 v2] Advanced Error Reporting
UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
ECRCChkCap+ ECRCChkEn-
  MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog:    
  Capabilities: [200 v1] #15
  Capabilities: [270 v1] #19
  Capabilities: [2a0 v1] Access Control Services
ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd-
EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd-
EgressCtrl- DirectTrans-
  Capabilities: [2b0 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable+, Smallest Translation Unit: 00
  Capabilities: [2c0 v1] Page Request Interface (PRI)
PRICtl: Enable- Reset-
PRISta: RF- UPRGI- Stopped+
Page Request Capacity: 0020, Page Request Allocation: 
  Capabilities: [2d0 v1] Process Address Space ID (PASID)
PASIDCap: Exec+ Priv+, Max PASID Width: 10
PASIDCtl: Enable- Exec- Priv-
  Capabilities: [320 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
  Kernel driver in use: amdgpu
  Kernel modules: amdgpu

// Daniel

On 2 July 2018 at 21:29, Alex Deucher  wrote:
> On Mon, Jul 2, 2018 at 3:21 PM, Daniel Andersson  wrote:
>> I get:
>> [drm:dc_link_detect [amdgpu]] *ERROR* No EDID read.
>> on boot. This started happening on 4.17 and is still an issue on 4.18-rc2.
>>
>> I have a Vega64 connected on Display Port to a monitor(1440p). The monitor
>> doesn't have any additional ports so I can't test HDMI.
>>
>> xrandr:
>> Screen 0: minimum 320 x 200, current 1024 x 768, maximum 16384 x 16384
>> DisplayPort-0 connected 1024x768+0+0 (normal left inverted right x axis y
>> axis) 0mm x 0mm
>>1024x768  60.00*
>>800x600   60.3256.25
>>848x480   60.00
>>640x480   59.94
>> DisplayPort-1 disconnected (normal left inverted right x axis y axis)
>> HDMI-A-0 disconnected (normal left inverted right

BUG: *ERROR* No EDID read

2018-07-02 Thread Daniel Andersson
I get:
[drm:dc_link_detect [amdgpu]] *ERROR* No EDID read.
on boot. This started happening on 4.17 and is still an issue on 4.18-rc2.

I have a Vega64 connected on Display Port to a monitor(1440p). The monitor
doesn't have any additional ports so I can't test HDMI.

xrandr:
Screen 0: minimum 320 x 200, current 1024 x 768, maximum 16384 x 16384
DisplayPort-0 connected 1024x768+0+0 (normal left inverted right x axis y
axis) 0mm x 0mm
   1024x768  60.00*
   800x600   60.3256.25
   848x480   60.00
   640x480   59.94
DisplayPort-1 disconnected (normal left inverted right x axis y axis)
HDMI-A-0 disconnected (normal left inverted right x axis y axis)
HDMI-A-1 disconnected (normal left inverted right x axis y axis)

The 1440p resolution worked on 4.16.

Regards,
Daniel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[BUG] Intermittent hang/deadlock when opening browser tab with Vega gpu

2018-02-02 Thread Daniel Andersson
Hi,



I have an intermittent deadlock/hang in the amdgpu driver. It seems to
happen when I open a new tab in qutebrowser(v1.1.1), while I am doing other
stuff, like watching youtube through mpv or playing dota 2. It seems to be
pretty arbitrary how often it happens. Sometimes it is once a week and
sometimes multiple times a day. I have a vega 64.



What happens is that the screen freezes but I still hear sound and can ssh
in to the box. If I reboot it remotely, I get dropped back to tty and it
tries to reboot but it gets stuck on blocking processes(mpv etc) so I have
to reset it manually.


Repro steps:



* run qutebrowser

* Do a bunch of other stuff, videos, games etc

* Switch back to qutebrowser and hit "Ctrl+t" & be "lucky"



This seems to happen on all release candidates for 4.15 and 4.15 itself:



4.15:
[ 2211.463021] INFO: task amdgpu_cs:0:1053 blocked for more than 120
seconds.
[ 2211.463026]   Not tainted 4.15.0-ARCH+ #1
[ 2211.463028] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 2211.463030] amdgpu_cs:0 D0  1053   1051 0x

[ 2211.463032] Call Trace:

[ 2211.463040]  ? __schedule+0x297/0x8b0

[ 2211.463043]  schedule+0x2f/0x90

[ 2211.463045]  schedule_timeout+0x1fd/0x3a0

[ 2211.463085]  ? amdgpu_job_alloc+0x37/0xc0 [amdgpu]

[ 2211.463088]  dma_fence_default_wait+0x1cc/0x270

[ 2211.463090]  ? dma_fence_release+0xa0/0xa0

[ 2211.463092]  dma_fence_wait_timeout+0x39/0x110

[ 2211.463119]  amdgpu_ctx_wait_prev_fence+0x46/0x80 [amdgpu]

[ 2211.463145]  amdgpu_cs_ioctl+0x98/0x1ac0 [amdgpu]

[ 2211.463149]  ? dequeue_entity+0xdc/0x460

[ 2211.463174]  ? amdgpu_cs_find_mapping+0xc0/0xc0 [amdgpu]

[ 2211.463185]  drm_ioctl_kernel+0x5b/0xb0 [drm]

[ 2211.463194]  drm_ioctl+0x2ae/0x350 [drm]

[ 2211.463218]  ? amdgpu_cs_find_mapping+0xc0/0xc0 [amdgpu]

[ 2211.463239]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]

[ 2211.463243]  do_vfs_ioctl+0xa4/0x630

[ 2211.463246]  ? SyS_futex+0x12d/0x180

[ 2211.463248]  SyS_ioctl+0x74/0x80

[ 2211.463251]  entry_SYSCALL_64_fastpath+0x20/0x83

[ 2211.463254] RIP: 0033:0x7f21b27b6d87

[ 2211.463255] RSP: 002b:7f21a83acab8 EFLAGS: 0246

[ 2334.343027] INFO: task amdgpu_cs:0:1053 blocked for more than 120
seconds.
[ 2334.343032]   Not tainted 4.15.0-ARCH+ #1

[ 2334.343034] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 2334.343036] amdgpu_cs:0 D0  1053   1051 0x

[ 2334.343039] Call Trace:

[ 2334.343046]  ? __schedule+0x297/0x8b0

[ 2334.343049]  schedule+0x2f/0x90

[ 2334.343051]  schedule_timeout+0x1fd/0x3a0

[ 2334.343091]  ? amdgpu_job_alloc+0x37/0xc0 [amdgpu]

[ 2334.343095]  dma_fence_default_wait+0x1cc/0x270

[ 2334.343097]  ? dma_fence_release+0xa0/0xa0

[ 2334.343098]  dma_fence_wait_timeout+0x39/0x110

[ 2334.343125]  amdgpu_ctx_wait_prev_fence+0x46/0x80 [amdgpu]

[ 2334.343151]  amdgpu_cs_ioctl+0x98/0x1ac0 [amdgpu]

[ 2334.343155]  ? dequeue_entity+0xdc/0x460

[ 2334.343181]  ? amdgpu_cs_find_mapping+0xc0/0xc0 [amdgpu]

[ 2334.343191]  drm_ioctl_kernel+0x5b/0xb0 [drm]

[ 2334.343200]  drm_ioctl+0x2ae/0x350 [drm]

[ 2334.343224]  ? amdgpu_cs_find_mapping+0xc0/0xc0 [amdgpu]

[ 2334.343245]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]

[ 2334.343249]  do_vfs_ioctl+0xa4/0x630

[ 2334.343252]  ? SyS_futex+0x12d/0x180

[ 2334.343254]  SyS_ioctl+0x74/0x80
[ 2334.343257]  entry_SYSCALL_64_fastpath+0x20/0x83
[ 2334.343259] RIP: 0033:0x7f21b27b6d87

[ 2334.343260] RSP: 002b:7f21a83acab8 EFLAGS: 0246
[ 2457.222859] INFO: task amdgpu_cs:0:1053 blocked for more than 120
seconds.
[ 2457.222862]   Not tainted 4.15.0-ARCH+ #1
[ 2457.222863] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 2457.222864] amdgpu_cs:0 D0  1053   1051 0x
[ 2457.222866] Call Trace:
[ 2457.222872]  ? __schedule+0x297/0x8b0
[ 2457.222873]  schedule+0x2f/0x90
[ 2457.222875]  schedule_timeout+0x1fd/0x3a0
[ 2457.222900]  ? amdgpu_job_alloc+0x37/0xc0 [amdgpu]
[ 2457.222902]  dma_fence_default_wait+0x1cc/0x270
[ 2457.222903]  ? dma_fence_release+0xa0/0xa0
[ 2457.222904]  dma_fence_wait_timeout+0x39/0x110
[ 2457.222918]  amdgpu_ctx_wait_prev_fence+0x46/0x80 [amdgpu]
[ 2457.222932]  amdgpu_cs_ioctl+0x98/0x1ac0 [amdgpu]
[ 2457.222935]  ? dequeue_entity+0xdc/0x460
[ 2457.222948]  ? amdgpu_cs_find_mapping+0xc0/0xc0 [amdgpu]
[ 2457.222955]  drm_ioctl_kernel+0x5b/0xb0 [drm]
[ 2457.222960]  drm_ioctl+0x2ae/0x350 [drm]
[ 2457.222972]  ? amdgpu_cs_find_mapping+0xc0/0xc0 [amdgpu]
[ 2457.222983]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[ 2457.222986]  do_vfs_ioctl+0xa4/0x630
[ 2457.222989]  ? SyS_futex+0x12d/0x180
[ 2457.222989]  SyS_ioctl+0x74/0x80
[ 2457.222991]  entry_SYSCALL_64_fastpath+0x20/0x83
[ 2457.222993] RIP: 0033:0x7f21b27b6d87
[ 2457.222993] RSP: 002b:7f21a83acab8 EFLAGS: 0246
[ 2580.102828] INFO: task amdgpu_cs:0:1053 blocked for more than 120
seconds.
[ 2580.102831]   Not tainted 4.15.0-ARCH+ #1
[ 2580.102832] "echo 0 >