BUG: [amdgpu]] *ERROR* ring gfx timeout
Hey, GPU hangs for a while(~20s) and dmesg is: 110.343379] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=7273, last emitted seq=7275 [ 110.343385] [drm] GPU recovery disabled. 4.18.8-arch1-1-ARCH & Vega10: Reproduce by: git clone git://github.com/bkaradzic/bx.git git clone git://github.com/bkaradzic/bimg.git git clone git://github.com/bkaradzic/bgfx.git cd bgfx make -jX linux-release64 cd examples/runtime ../../.build/linux64_gcc/bin/examplesRelease Switch to example "37-gpudrivenrendering" from the drop down menu / Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: BUG: *ERROR* No EDID read
Hey, I had some time to bisect again tonight and it seems 018d82e5f02ef3583411bcaa4e00c69786f46f19 got back in again through: # first bad commit: [d98c71dadc2d0debdb80beb5a478baf1e6f98758] Merge drm-upstream/drm-next into drm-misc-next / Daniel On Wed, 29 Aug 2018 at 16:02, Daniel Andersson wrote: > > Hey again, > > This is an issue for me yet again on 4.19-rc1: > > [5.743354] [drm:dc_link_detect [amdgpu]] *ERROR* No EDID read. > > / Daniel > > dmesg with drm.debug=0x4: > > [0.00] Linux version 4.19.0-rc1-ARCH (engy@sleipnir) (gcc > version 8.2.0 (GCC)) #1 SMP PREEMPT Tue Aug 28 10:36:50 CEST 2018 > [0.00] Command line: BOOT_IMAGE=/vmlinuz-linuxtest > root=UUID=27247597-a354-42f3-8040-caff9592a297 rw quiet drm.debug=0x4 > [0.00] KERNEL supported cpus: > [0.00] AMD AuthenticAMD > [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating > point registers' > [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' > [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' > [0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 > [0.00] x86/fpu: Enabled xstate features 0x7, context size is > 832 bytes, using 'compacted' format. > [0.00] BIOS-provided physical RAM map: > [0.00] BIOS-e820: [mem 0x-0x0009] usable > [0.00] BIOS-e820: [mem 0x000a-0x000f] reserved > [0.00] BIOS-e820: [mem 0x0010-0x09de] usable > [0.00] BIOS-e820: [mem 0x09df-0x09ff] reserved > [0.00] BIOS-e820: [mem 0x0a00-0xb75f] usable > [0.00] BIOS-e820: [mem 0xb760-0xb8b4afff] reserved > [0.00] BIOS-e820: [mem 0xb8b4b000-0xb8f2cfff] usable > [0.00] BIOS-e820: [mem 0xb8f2d000-0xb902dfff] ACPI NVS > [0.00] BIOS-e820: [mem 0xb902e000-0xb9cf6fff] reserved > [0.00] BIOS-e820: [mem 0xb9cf7000-0xb9e16fff] type 20 > [0.00] BIOS-e820: [mem 0xb9e17000-0xbbff] usable > [0.00] BIOS-e820: [mem 0xbc00-0xbfff] reserved > [0.00] BIOS-e820: [mem 0xed00-0xed07] reserved > [0.00] BIOS-e820: [mem 0xed0f-0xed0f0fff] reserved > [0.00] BIOS-e820: [mem 0xef60-0xef67] reserved > [0.00] BIOS-e820: [mem 0xef6f-0xef6f0fff] reserved > [0.00] BIOS-e820: [mem 0xfea0-0xfec00fff] reserved > [0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved > [0.00] BIOS-e820: [mem 0xfec3-0xfec30fff] reserved > [0.00] BIOS-e820: [mem 0xfed0-0xfed00fff] reserved > [0.00] BIOS-e820: [mem 0xfed4-0xfed44fff] reserved > [0.00] BIOS-e820: [mem 0xfed8-0xfed8] reserved > [0.00] BIOS-e820: [mem 0xfedc-0xfedc0fff] reserved > [0.00] BIOS-e820: [mem 0xfedc2000-0xfedc] reserved > [0.00] BIOS-e820: [mem 0xfedd4000-0xfedd5fff] reserved > [0.00] BIOS-e820: [mem 0xfee0-0xfeef] reserved > [0.00] BIOS-e820: [mem 0xff00-0x] reserved > [0.00] BIOS-e820: [mem 0x0001-0x00083f2f] usable > [0.00] BIOS-e820: [mem 0x00083f30-0x00083fff] reserved > [0.00] NX (Execute Disable) protection: active > [0.00] efi: EFI v2.60 by American Megatrends > [0.00] efi: ACPI 2.0=0xb8f2d000 ACPI=0xb8f2d000 > SMBIOS=0xb9c66000 SMBIOS 3.0=0xb9c65000 ESRT=0xb5ebef98 > [0.00] SMBIOS 3.0.0 present. > [0.00] DMI: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 > Taichi, BIOS P2.00 11/21/2017 > [0.00] tsc: Fast TSC calibration failed > [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved > [0.00] e820: remove [mem 0x000a-0x000f] usable > [0.00] last_pfn = 0x83f300 max_arch_pfn = 0x4 > [0.00] MTRR default type: uncachable > [0.00] MTRR fixed ranges enabled: > [0.00] 0-9 write-back > [0.00] A-B write-through > [0.00] C-D uncachable > [0.00] E-F write-protect > [0.00] MTRR variable ranges enabled: > [0.00] 0 base mask 8000 write-back > [0.00] 1 base 8000 mask C000 write-back > [0.00] 2 base BC00 m
Re: BUG: *ERROR* No EDID read
Alright, I redid the bisect and this time ended up on 018d82e5f02ef3583411bcaa4e00c69786f46f19 . If I revert it, it fixes my issue. Reverting it on 4.18-rc2 also works. // Daniel On 5 July 2018 at 21:47, Daniel Andersson wrote: > [0.00] Command line: BOOT_IMAGE=/vmlinuz-linuxtest > root=UUID=27247597-a354-42f3-8040-caff9592a297 drm.debug=0x4 rw quiet > [0.00] Kernel command line: BOOT_IMAGE=/vmlinuz-linuxtest > root=UUID=27247597-a354-42f3-8040-caff9592a297 drm.debug=0x4 rw quiet > [5.674793] [drm:bios_parser_get_firmware_info [amdgpu]] At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.674833] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.674887] [drm:bios_parser_get_firmware_info [amdgpu]] At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.674930] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.674974] [drm:bios_parser_get_firmware_info [amdgpu]] At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.675014] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.675056] [drm:bios_parser_get_firmware_info [amdgpu]] At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.675095] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.675138] [drm:bios_parser_get_firmware_info [amdgpu]] At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.675178] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.675221] [drm:bios_parser_get_firmware_info [amdgpu]] At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.675260] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.675304] [drm:bios_parser_get_firmware_info [amdgpu]] At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.675342] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.675384] [drm:bios_parser_get_firmware_info [amdgpu]] At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.675422] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.675465] [drm:bios_parser_get_firmware_info [amdgpu]] At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.675503] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.675587] [drm:bios_parser_get_firmware_info [amdgpu]] At > bios_parser_get_firmware_info switch, got major 3 minor 1 > [5.675626] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At > bios_parser_get_firmware_info switch, got major 3 minor 1 > > I don't really know what is going on. If I go back to 4.17 and apply > my "fix". It doesn't work. I suppose my bisect didn't get me the right > commit. I probably never tested that the commit, from the bisect, was > actually bad. I was also a little lazy and did "bisect start > 29dcea88779c856c7dc92040a0c01233263101d4 > 6da6c0db5316275015e8cc2959f12a17584aeb64 -- drivers/gpu/drm/amd". > > I guess I'll try another bisect tomorrow on the entire tree, sorry for > the extra work. > > // Daniel > > On 5 July 2018 at 20:22, Harry Wentland wrote: >> On 2018-07-05 01:43 PM, Daniel Andersson wrote: >>> Well, this workaround: >>> >>> diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c >>> b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c >>> index 10a5807a7e8b..d0f5910c906c 100644 >>> --- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c >>> +++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c >>> @@ -1321,6 +1321,8 @@ static enum bp_result bios_parser_get_firmware_info( >>> header = GET_IMAGE(struct atom_common_table_header, >>> DATA_TABLES(firmwareinfo)); >>> get_atom_data_table_revision(header, ); >>> +dm_output_to_console("At bios_parser_get_firmware_info switch, >>> got major %d minor %d", revision.major, revision.minor); >>> +dm_error("At bios_parser_get_firmware_info switch, got major %d >>> minor %d", revision.major, revision.minor); >>> switch (revision.major) { >>> case 3: >>> switch (revision.minor) { >>> @@ -1328,7 +1330,7 @@ static enum bp_result b
Re: BUG: *ERROR* No EDID read
[0.00] Command line: BOOT_IMAGE=/vmlinuz-linuxtest root=UUID=27247597-a354-42f3-8040-caff9592a297 drm.debug=0x4 rw quiet [0.00] Kernel command line: BOOT_IMAGE=/vmlinuz-linuxtest root=UUID=27247597-a354-42f3-8040-caff9592a297 drm.debug=0x4 rw quiet [5.674793] [drm:bios_parser_get_firmware_info [amdgpu]] At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.674833] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.674887] [drm:bios_parser_get_firmware_info [amdgpu]] At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.674930] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.674974] [drm:bios_parser_get_firmware_info [amdgpu]] At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.675014] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.675056] [drm:bios_parser_get_firmware_info [amdgpu]] At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.675095] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.675138] [drm:bios_parser_get_firmware_info [amdgpu]] At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.675178] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.675221] [drm:bios_parser_get_firmware_info [amdgpu]] At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.675260] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.675304] [drm:bios_parser_get_firmware_info [amdgpu]] At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.675342] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.675384] [drm:bios_parser_get_firmware_info [amdgpu]] At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.675422] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.675465] [drm:bios_parser_get_firmware_info [amdgpu]] At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.675503] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.675587] [drm:bios_parser_get_firmware_info [amdgpu]] At bios_parser_get_firmware_info switch, got major 3 minor 1 [5.675626] [drm:bios_parser_get_firmware_info [amdgpu]] *ERROR* At bios_parser_get_firmware_info switch, got major 3 minor 1 I don't really know what is going on. If I go back to 4.17 and apply my "fix". It doesn't work. I suppose my bisect didn't get me the right commit. I probably never tested that the commit, from the bisect, was actually bad. I was also a little lazy and did "bisect start 29dcea88779c856c7dc92040a0c01233263101d4 6da6c0db5316275015e8cc2959f12a17584aeb64 -- drivers/gpu/drm/amd". I guess I'll try another bisect tomorrow on the entire tree, sorry for the extra work. // Daniel On 5 July 2018 at 20:22, Harry Wentland wrote: > On 2018-07-05 01:43 PM, Daniel Andersson wrote: >> Well, this workaround: >> >> diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c >> b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c >> index 10a5807a7e8b..d0f5910c906c 100644 >> --- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c >> +++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c >> @@ -1321,6 +1321,8 @@ static enum bp_result bios_parser_get_firmware_info( >> header = GET_IMAGE(struct atom_common_table_header, >> DATA_TABLES(firmwareinfo)); >> get_atom_data_table_revision(header, ); >> +dm_output_to_console("At bios_parser_get_firmware_info switch, >> got major %d minor %d", revision.major, revision.minor); >> +dm_error("At bios_parser_get_firmware_info switch, got major %d >> minor %d", revision.major, revision.minor); >> switch (revision.major) { >> case 3: >> switch (revision.minor) { >> @@ -1328,7 +1330,7 @@ static enum bp_result bios_parser_get_firmware_info( >> result = get_firmware_info_v3_1(bp, info); >> break; >> case 2: >> - result = get_firmware_info_v3_2(bp, info); >> + result = get_firmware_info_v3_1(bp, info); >> break; >> default: >> break; >> >> "works": >> [engy][~/devel/3pp/linux] ((6e65fb862064...)|BISECTING)$ xrandr >> Screen 0: minimum 320 x 200, current 2560 x 1440, maximum 16384 x 16384 >> DisplayPort-0 connected 2560x1440+0+0 (normal left inverted right x >> axis y axis) 598mm x 336mm >>
Re: BUG: *ERROR* No EDID read
Well, this workaround: diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c index 10a5807a7e8b..d0f5910c906c 100644 --- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c +++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c @@ -1321,6 +1321,8 @@ static enum bp_result bios_parser_get_firmware_info( header = GET_IMAGE(struct atom_common_table_header, DATA_TABLES(firmwareinfo)); get_atom_data_table_revision(header, ); +dm_output_to_console("At bios_parser_get_firmware_info switch, got major %d minor %d", revision.major, revision.minor); +dm_error("At bios_parser_get_firmware_info switch, got major %d minor %d", revision.major, revision.minor); switch (revision.major) { case 3: switch (revision.minor) { @@ -1328,7 +1330,7 @@ static enum bp_result bios_parser_get_firmware_info( result = get_firmware_info_v3_1(bp, info); break; case 2: - result = get_firmware_info_v3_2(bp, info); + result = get_firmware_info_v3_1(bp, info); break; default: break; "works": [engy][~/devel/3pp/linux] ((6e65fb862064...)|BISECTING)$ xrandr Screen 0: minimum 320 x 200, current 2560 x 1440, maximum 16384 x 16384 DisplayPort-0 connected 2560x1440+0+0 (normal left inverted right x axis y axis) 598mm x 336mm 2560x1440 59.95*+ 120.0099.9584.9823.97 1920x1200 59.95 1920x1080 59.95 1600x1200 59.95 1680x1050 59.95 1280x1024 59.95 1440x900 59.95 1280x800 59.95 1280x720 59.95 1024x768 59.95 800x600 59.95 640x480 59.95 DisplayPort-1 disconnected (normal left inverted right x axis y axis) HDMI-A-0 disconnected (normal left inverted right x axis y axis) HDMI-A-1 disconnected (normal left inverted right x axis y axis) Where does dm_error and dm_output_to_console end up? // Daniel On 5 July 2018 at 18:42, Deucher, Alexander wrote: > So your vbios has table v3.1 so it should not be affected by that patch. > Does reverting that patch actually fix the issue? > > > Alex > > ________ > From: amd-gfx on behalf of Daniel > Andersson > Sent: Thursday, July 5, 2018 12:22:17 PM > To: Alex Deucher > Cc: amd-gfx@lists.freedesktop.org > Subject: Re: BUG: *ERROR* No EDID read > > I have not flashed any GPU BIOS. It's not a reference Vega though, > Sapphire something. Maybe they made changes? > > vbios is attached. > > // Daniel > > On 5 July 2018 at 15:38, Alex Deucher wrote: >> On Mon, Jul 2, 2018 at 5:39 PM, Daniel Andersson >> wrote: >>> Sure, bisecting gets me 6e65fb862064663ad3a08f964af1e8f3f2abf688 . >>> >>> In drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c, >>> get_firmware_info_v3_1() works but get_firmware_info_v3_2() does not >>> do the right thing for my Vega. >>> >>> Could I break my GPU if I were to set some bad/wrong frequency there? >> >> vega10 should not hit that new path at all. Have you edited your >> vbios? Can you send us a copy? To get a copy of the vbios: >> >> Without the driver loaded: >> (as root) >> (use lspci to get the bus id) >> cd /sys/bus/pci/devices/ >> echo 1 > rom >> cat rom > /tmp/vbios.rom >> echo 0 > rom >> >> If the driver is loaded: >> (as root) >> cat /sys/kernel/debug/dri/0/amdgpu_vbios > /tmp/vbios.rom >> >> Alex >> >>> >>> lspci: >>> 43:00.0 VGA compatible controller: Advanced Micro Devices, Inc. >>> [AMD/ATI] Vega 10 XT [Radeon RX Vega 64] (rev c1) (prog-if 00 [VGA >>> controller]) >>> Subsystem: Sapphire Technology Limited Vega 10 XT [Radeon RX Vega >>> 64] >>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- >>> Stepping- SERR- FastB2B- DisINTx+ >>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- >>> SERR- >> Latency: 0, Cache Line Size: 64 bytes >>> Interrupt: pin A routed to IRQ 85 >>> Region 0: Memory at d000 (64-bit, prefetchable) [size=256M] >>> Region 2: Memory at e000 (64-bit, prefetchable) [size=2M] >>> Region 4: I/O ports at f000 [size=256] >>> Region 5: Memory at ed40 (32-bit, non-prefetchable) [size=512K] >>> Expansion ROM at ed48 [disabled] [size=128K] >>> Capabilities: [48] Vendor Specific Information: Len=08 >>> Capabilities: [50] Power Management version 3 >>> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA >>> PME(D0-,D1+,D2+,D3hot+,D3cold+) >>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- >>> Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00 >>>
Re: BUG: *ERROR* No EDID read
I have not flashed any GPU BIOS. It's not a reference Vega though, Sapphire something. Maybe they made changes? vbios is attached. // Daniel On 5 July 2018 at 15:38, Alex Deucher wrote: > On Mon, Jul 2, 2018 at 5:39 PM, Daniel Andersson wrote: >> Sure, bisecting gets me 6e65fb862064663ad3a08f964af1e8f3f2abf688 . >> >> In drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c, >> get_firmware_info_v3_1() works but get_firmware_info_v3_2() does not >> do the right thing for my Vega. >> >> Could I break my GPU if I were to set some bad/wrong frequency there? > > vega10 should not hit that new path at all. Have you edited your > vbios? Can you send us a copy? To get a copy of the vbios: > > Without the driver loaded: > (as root) > (use lspci to get the bus id) > cd /sys/bus/pci/devices/ > echo 1 > rom > cat rom > /tmp/vbios.rom > echo 0 > rom > > If the driver is loaded: > (as root) > cat /sys/kernel/debug/dri/0/amdgpu_vbios > /tmp/vbios.rom > > Alex > >> >> lspci: >> 43:00.0 VGA compatible controller: Advanced Micro Devices, Inc. >> [AMD/ATI] Vega 10 XT [Radeon RX Vega 64] (rev c1) (prog-if 00 [VGA >> controller]) >> Subsystem: Sapphire Technology Limited Vega 10 XT [Radeon RX Vega >> 64] >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- >> Stepping- SERR- FastB2B- DisINTx+ >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- >> SERR- > Latency: 0, Cache Line Size: 64 bytes >> Interrupt: pin A routed to IRQ 85 >> Region 0: Memory at d000 (64-bit, prefetchable) [size=256M] >> Region 2: Memory at e000 (64-bit, prefetchable) [size=2M] >> Region 4: I/O ports at f000 [size=256] >> Region 5: Memory at ed40 (32-bit, non-prefetchable) [size=512K] >> Expansion ROM at ed48 [disabled] [size=128K] >> Capabilities: [48] Vendor Specific Information: Len=08 >> Capabilities: [50] Power Management version 3 >> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA >> PME(D0-,D1+,D2+,D3hot+,D3cold+) >> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- >> Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00 >> DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 >> unlimited >> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- >> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- >> RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ >> MaxPayload 256 bytes, MaxReadReq 512 bytes >> DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- >> TransPend- >> LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency >> L0s <64ns, L1 <1us >> ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ >> LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+ >> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- >> LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- >> BWMgmt- ABWMgmt- >> DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+, >> OBFF Not Supported >>AtomicOpsCap: 32bit- 64bit- 128bitCAS- >> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF >> Disabled >>AtomicOpsCtl: ReqEn- >> LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- >>Transmit Margin: Normal Operating Range, >> EnterModifiedCompliance- ComplianceSOS- >>Compliance De-emphasis: -6dB >> LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, >> EqualizationPhase1+ >>EqualizationPhase2+, EqualizationPhase3+, >> LinkEqualizationRequest- >> Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ >> Address: fee0 Data: >> Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 >> Len=010 >> Capabilities: [150 v2] Advanced Error Reporting >> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- >> MalfTLP- ECRC- UnsupReq- ACSViol- >> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- >> MalfTLP- ECRC- UnsupReq- ACSViol- >> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ >> MalfTLP+ ECRC- UnsupReq- ACSViol- >> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ >> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ >> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- >> ECRCChkCap+ ECRCChkEn- >> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- >> HeaderLog: >> Capa
Re: BUG: *ERROR* No EDID read
Sure, bisecting gets me 6e65fb862064663ad3a08f964af1e8f3f2abf688 . In drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c, get_firmware_info_v3_1() works but get_firmware_info_v3_2() does not do the right thing for my Vega. Could I break my GPU if I were to set some bad/wrong frequency there? lspci: 43:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon RX Vega 64] (rev c1) (prog-if 00 [VGA controller]) Subsystem: Sapphire Technology Limited Vega 10 XT [Radeon RX Vega 64] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+, OBFF Not Supported AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+ EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: fee0 Data: Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 Capabilities: [150 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: Capabilities: [200 v1] #15 Capabilities: [270 v1] #19 Capabilities: [2a0 v1] Access Control Services ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- Capabilities: [2b0 v1] Address Translation Service (ATS) ATSCap: Invalidate Queue Depth: 00 ATSCtl: Enable+, Smallest Translation Unit: 00 Capabilities: [2c0 v1] Page Request Interface (PRI) PRICtl: Enable- Reset- PRISta: RF- UPRGI- Stopped+ Page Request Capacity: 0020, Page Request Allocation: Capabilities: [2d0 v1] Process Address Space ID (PASID) PASIDCap: Exec+ Priv+, Max PASID Width: 10 PASIDCtl: Enable- Exec- Priv- Capabilities: [320 v1] Latency Tolerance Reporting Max snoop latency: 0ns Max no snoop latency: 0ns Kernel driver in use: amdgpu Kernel modules: amdgpu // Daniel On 2 July 2018 at 21:29, Alex Deucher wrote: > On Mon, Jul 2, 2018 at 3:21 PM, Daniel Andersson wrote: >> I get: >> [drm:dc_link_detect [amdgpu]] *ERROR* No EDID read. >> on boot. This started happening on 4.17 and is still an issue on 4.18-rc2. >> >> I have a Vega64 connected on Display Port to a monitor(1440p). The monitor >> doesn't have any additional ports so I can't test HDMI. >> >> xrandr: >> Screen 0: minimum 320 x 200, current 1024 x 768, maximum 16384 x 16384 >> DisplayPort-0 connected 1024x768+0+0 (normal left inverted right x axis y >> axis) 0mm x 0mm >>1024x768 60.00* >>800x600 60.3256.25 >>848x480 60.00 >>640x480 59.94 >> DisplayPort-1 disconnected (normal left inverted right x axis y axis) >> HDMI-A-0 disconnected (normal left inverted right
BUG: *ERROR* No EDID read
I get: [drm:dc_link_detect [amdgpu]] *ERROR* No EDID read. on boot. This started happening on 4.17 and is still an issue on 4.18-rc2. I have a Vega64 connected on Display Port to a monitor(1440p). The monitor doesn't have any additional ports so I can't test HDMI. xrandr: Screen 0: minimum 320 x 200, current 1024 x 768, maximum 16384 x 16384 DisplayPort-0 connected 1024x768+0+0 (normal left inverted right x axis y axis) 0mm x 0mm 1024x768 60.00* 800x600 60.3256.25 848x480 60.00 640x480 59.94 DisplayPort-1 disconnected (normal left inverted right x axis y axis) HDMI-A-0 disconnected (normal left inverted right x axis y axis) HDMI-A-1 disconnected (normal left inverted right x axis y axis) The 1440p resolution worked on 4.16. Regards, Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[BUG] Intermittent hang/deadlock when opening browser tab with Vega gpu
Hi, I have an intermittent deadlock/hang in the amdgpu driver. It seems to happen when I open a new tab in qutebrowser(v1.1.1), while I am doing other stuff, like watching youtube through mpv or playing dota 2. It seems to be pretty arbitrary how often it happens. Sometimes it is once a week and sometimes multiple times a day. I have a vega 64. What happens is that the screen freezes but I still hear sound and can ssh in to the box. If I reboot it remotely, I get dropped back to tty and it tries to reboot but it gets stuck on blocking processes(mpv etc) so I have to reset it manually. Repro steps: * run qutebrowser * Do a bunch of other stuff, videos, games etc * Switch back to qutebrowser and hit "Ctrl+t" & be "lucky" This seems to happen on all release candidates for 4.15 and 4.15 itself: 4.15: [ 2211.463021] INFO: task amdgpu_cs:0:1053 blocked for more than 120 seconds. [ 2211.463026] Not tainted 4.15.0-ARCH+ #1 [ 2211.463028] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2211.463030] amdgpu_cs:0 D0 1053 1051 0x [ 2211.463032] Call Trace: [ 2211.463040] ? __schedule+0x297/0x8b0 [ 2211.463043] schedule+0x2f/0x90 [ 2211.463045] schedule_timeout+0x1fd/0x3a0 [ 2211.463085] ? amdgpu_job_alloc+0x37/0xc0 [amdgpu] [ 2211.463088] dma_fence_default_wait+0x1cc/0x270 [ 2211.463090] ? dma_fence_release+0xa0/0xa0 [ 2211.463092] dma_fence_wait_timeout+0x39/0x110 [ 2211.463119] amdgpu_ctx_wait_prev_fence+0x46/0x80 [amdgpu] [ 2211.463145] amdgpu_cs_ioctl+0x98/0x1ac0 [amdgpu] [ 2211.463149] ? dequeue_entity+0xdc/0x460 [ 2211.463174] ? amdgpu_cs_find_mapping+0xc0/0xc0 [amdgpu] [ 2211.463185] drm_ioctl_kernel+0x5b/0xb0 [drm] [ 2211.463194] drm_ioctl+0x2ae/0x350 [drm] [ 2211.463218] ? amdgpu_cs_find_mapping+0xc0/0xc0 [amdgpu] [ 2211.463239] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [ 2211.463243] do_vfs_ioctl+0xa4/0x630 [ 2211.463246] ? SyS_futex+0x12d/0x180 [ 2211.463248] SyS_ioctl+0x74/0x80 [ 2211.463251] entry_SYSCALL_64_fastpath+0x20/0x83 [ 2211.463254] RIP: 0033:0x7f21b27b6d87 [ 2211.463255] RSP: 002b:7f21a83acab8 EFLAGS: 0246 [ 2334.343027] INFO: task amdgpu_cs:0:1053 blocked for more than 120 seconds. [ 2334.343032] Not tainted 4.15.0-ARCH+ #1 [ 2334.343034] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2334.343036] amdgpu_cs:0 D0 1053 1051 0x [ 2334.343039] Call Trace: [ 2334.343046] ? __schedule+0x297/0x8b0 [ 2334.343049] schedule+0x2f/0x90 [ 2334.343051] schedule_timeout+0x1fd/0x3a0 [ 2334.343091] ? amdgpu_job_alloc+0x37/0xc0 [amdgpu] [ 2334.343095] dma_fence_default_wait+0x1cc/0x270 [ 2334.343097] ? dma_fence_release+0xa0/0xa0 [ 2334.343098] dma_fence_wait_timeout+0x39/0x110 [ 2334.343125] amdgpu_ctx_wait_prev_fence+0x46/0x80 [amdgpu] [ 2334.343151] amdgpu_cs_ioctl+0x98/0x1ac0 [amdgpu] [ 2334.343155] ? dequeue_entity+0xdc/0x460 [ 2334.343181] ? amdgpu_cs_find_mapping+0xc0/0xc0 [amdgpu] [ 2334.343191] drm_ioctl_kernel+0x5b/0xb0 [drm] [ 2334.343200] drm_ioctl+0x2ae/0x350 [drm] [ 2334.343224] ? amdgpu_cs_find_mapping+0xc0/0xc0 [amdgpu] [ 2334.343245] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [ 2334.343249] do_vfs_ioctl+0xa4/0x630 [ 2334.343252] ? SyS_futex+0x12d/0x180 [ 2334.343254] SyS_ioctl+0x74/0x80 [ 2334.343257] entry_SYSCALL_64_fastpath+0x20/0x83 [ 2334.343259] RIP: 0033:0x7f21b27b6d87 [ 2334.343260] RSP: 002b:7f21a83acab8 EFLAGS: 0246 [ 2457.222859] INFO: task amdgpu_cs:0:1053 blocked for more than 120 seconds. [ 2457.222862] Not tainted 4.15.0-ARCH+ #1 [ 2457.222863] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2457.222864] amdgpu_cs:0 D0 1053 1051 0x [ 2457.222866] Call Trace: [ 2457.222872] ? __schedule+0x297/0x8b0 [ 2457.222873] schedule+0x2f/0x90 [ 2457.222875] schedule_timeout+0x1fd/0x3a0 [ 2457.222900] ? amdgpu_job_alloc+0x37/0xc0 [amdgpu] [ 2457.222902] dma_fence_default_wait+0x1cc/0x270 [ 2457.222903] ? dma_fence_release+0xa0/0xa0 [ 2457.222904] dma_fence_wait_timeout+0x39/0x110 [ 2457.222918] amdgpu_ctx_wait_prev_fence+0x46/0x80 [amdgpu] [ 2457.222932] amdgpu_cs_ioctl+0x98/0x1ac0 [amdgpu] [ 2457.222935] ? dequeue_entity+0xdc/0x460 [ 2457.222948] ? amdgpu_cs_find_mapping+0xc0/0xc0 [amdgpu] [ 2457.222955] drm_ioctl_kernel+0x5b/0xb0 [drm] [ 2457.222960] drm_ioctl+0x2ae/0x350 [drm] [ 2457.222972] ? amdgpu_cs_find_mapping+0xc0/0xc0 [amdgpu] [ 2457.222983] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [ 2457.222986] do_vfs_ioctl+0xa4/0x630 [ 2457.222989] ? SyS_futex+0x12d/0x180 [ 2457.222989] SyS_ioctl+0x74/0x80 [ 2457.222991] entry_SYSCALL_64_fastpath+0x20/0x83 [ 2457.222993] RIP: 0033:0x7f21b27b6d87 [ 2457.222993] RSP: 002b:7f21a83acab8 EFLAGS: 0246 [ 2580.102828] INFO: task amdgpu_cs:0:1053 blocked for more than 120 seconds. [ 2580.102831] Not tainted 4.15.0-ARCH+ #1 [ 2580.102832] "echo 0 >