[Bug 1902981] Re: AGP GPU on PCI mode (when AGP is disabled at kernel build time) known to fail on K8 and K10 platforms
On a side note, because we see a clear behaviour difference when applying the PCI patch we can assume the driver catch the `rdev->flags & RADEON_IS_PCI` test instead of the `rdev->flags & RADEON_IS_AGP` one when running an AGP GPU with AGP disabled in kernel at build time. ** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed ** Tags added: amd64 focal kernel-bug -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1902981 Title: AGP GPU on PCI mode (when AGP is disabled at kernel build time) known to fail on K8 and K10 platforms To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1902981/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1902981] Re: AGP GPU on PCI mode (when AGP is disabled at kernel build time) known to fail on K8 and K10 platforms
It looks like comment #3 had been truncated, the interesting part of the dmesg log that is missing is: ``` [ 66.755306] radeon :01:00.0: ring 0 stalled for more than 31248msec [ 66.755317] radeon :01:00.0: GPU lockup (current fence id 0x0001 last fence id 0x0002 on ring 0) [ 66.840372] radeon :01:00.0: Saved 25 dwords of commands on ring 0. [ 66.840402] radeon :01:00.0: GPU softreset: 0x0019 [ 66.840408] radeon :01:00.0: R_008010_GRBM_STATUS = 0xA27034A1 [ 66.840414] radeon :01:00.0: R_008014_GRBM_STATUS2 = 0x0102 [ 66.840419] radeon :01:00.0: R_000E50_SRBM_STATUS = 0x200028C0 [ 66.840424] radeon :01:00.0: R_008674_CP_STALLED_STAT1 = 0x0400 [ 66.840429] radeon :01:00.0: R_008678_CP_STALLED_STAT2 = 0x00010100 [ 66.840434] radeon :01:00.0: R_00867C_CP_BUSY_STAT = 0x8C80 [ 66.840438] radeon :01:00.0: R_008680_CP_STAT = 0x808182E7 [ 66.840443] radeon :01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [ 67.364934] radeon :01:00.0: Wait for MC idle timedout ! [ 67.364940] radeon :01:00.0: R_008020_GRBM_SOFT_RESET=0x7F6B [ 67.365005] radeon :01:00.0: SRBM_SOFT_RESET=0x0100 [ 67.367106] radeon :01:00.0: R_008010_GRBM_STATUS = 0x3028 [ 67.367110] radeon :01:00.0: R_008014_GRBM_STATUS2 = 0x0002 [ 67.367114] radeon :01:00.0: R_000E50_SRBM_STATUS = 0x200028C0 [ 67.367118] radeon :01:00.0: R_008674_CP_STALLED_STAT1 = 0x [ 67.367122] radeon :01:00.0: R_008678_CP_STALLED_STAT2 = 0x [ 67.367126] radeon :01:00.0: R_00867C_CP_BUSY_STAT = 0x [ 67.367130] radeon :01:00.0: R_008680_CP_STAT = 0x [ 67.367134] radeon :01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [ 67.367152] radeon :01:00.0: GPU reset succeeded, trying to resume [ 67.842179] radeon :01:00.0: Wait for MC idle timedout ! [ 68.068765] radeon :01:00.0: Wait for MC idle timedout ! [ 68.082273] [drm] PCIE GART of 1024M enabled (table at 0x0014C000). [ 68.082448] radeon :01:00.0: WB enabled [ 68.082454] radeon :01:00.0: fence driver on ring 0 use gpu addr 0x4c00 [ 68.082459] radeon :01:00.0: fence driver on ring 3 use gpu addr 0x4c0c [ 68.088977] radeon :01:00.0: fence driver on ring 5 use gpu addr 0x0005c598 [ 68.374095] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD) [ 68.374176] [drm:rv770_resume [radeon]] *ERROR* r600 startup failed on resume ``` This is what happens when applying the patch to force 32-bit DMA bit mask on PCI devices. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1902981 Title: AGP GPU on PCI mode (when AGP is disabled at kernel build time) known to fail on K8 and K10 platforms To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1902981/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1902981] Re: AGP GPU on PCI mode (when AGP is disabled at kernel build time) known to fail on K8 and K10 platforms
To get a better picture of such top-of-the-line AGP GPU performance, when comparing to others GPUs on Unvanquished GPU compatibility matrix: https://wiki.unvanquished.net/wiki/GPU_compatibility_matrix we can see the ATI Radeon HD 4670 AGP (RV730 XT, TeraScale 1) performs: - better than the PCI Express ATI Radeon HD 7450 from Q1 2012 (RV910, Caicos, TeraScale 2), - like the mobile Nvidia GeForce GT 740M from Q2 2013 with nvidia driver (NVE7, GK107M, Kepler), - like the mobile Quadro K1100M from Q3 2013 with nvidia driver (NVE7, GK107GLM, Kepler), - like the integrated Intel HD 4600 from Q1 2014 (i7-4810MQ, Haswell, Gen7 GT2), - like the integrated Intel HD 520 from Q3 2015 (i3-6100U, Skylake, Gen9 GT2), - like the PCI Express GeForce GTX 1050 Ti from Q4 2016 when running the nouveau driver (Pascal). On Nvidia side, to outperform this GPU on Linux with the free open source nouveau driver it is required to acquire at least a GeForce GTX 1060 from 2016 (NV136, GP106-300-A1, Pascal). Intel users may had to wait for the UHD 600 series (2016) to outperform this ATI AGP GPU. To this day the first verified Intel GPU that is known to outperform this ATI AGP GPU is the UHD 620 from Q3 2019. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1902981 Title: AGP GPU on PCI mode (when AGP is disabled at kernel build time) known to fail on K8 and K10 platforms To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1902981/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1902981] Re: AGP GPU on PCI mode (when AGP is disabled at kernel build time) known to fail on K8 and K10 platforms
When applying patch from https://bugs.launchpad.net/bugs/1902795 - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1902795/+attachment/5431335/+files/0001 -drm-radeon-make-all-PCI-GPUs-use-32bits-DMA-bit-mask.patch which reduces the breakage (but not fix completely) the issues faced with PCI GPUs on K8 and K10 hosts by setting DMA bit mask to 32-bits for all PCI GPUs, we can see those this that is fixed on PCI GPUs is not fixed on AGP-as-PCI GPUs (and there is even more errores before that): ``` [ 5.242322] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD) ``` Things even go that wrong we even don't see those other errors that are expected to be seen after that: ``` [ 5.242359] radeon :01:00.0: disabling GPU acceleration ``` ``` [ 34.558889] trying to bind memory to uninitialized GART ! ``` Instead, the kernel loops before reaching those errors, trying desperately to pass this r600_ring_test step. But before r600_ring_test failure message is printed, more and newer issues about ring 0 being stalled and GU lockup occurs with AGP-as-PCI GPUs that are never seen with PCI-native GPUs, especially when taken in account PCI GPUs can at least pass the r600_ring_test with the patch. Also, after the r600_ring_test failure message, instead of getting the message telling GPU acceleration is disabled, we get a message about r600 startup failing on resume which is new. This is why it is believed that fixing PCI GPUs may not be enough to fix AGP GPUs running as PCI ones when AGP is disabled at kernel build time. Here are the issues that is only seen with AGP-as-PCI GPUs, occurring before and after the r600_ring_test failure message: ``` [ 45.763336] radeon :01:00.0: ring 0 stalled for more than 10256msec [ 45.763349] radeon :01:00.0: GPU lockup (current fence id 0x0001 last fence id 0x0002 on ring 0) [ 46.275324] radeon :01:00.0: ring 0 stalled for more than 10768msec [ 46.275335] radeon :01:00.0: GPU lockup (current fence id 0x0001 last fence id 0x0002 on ring 0) [ 46.787322] radeon :01:00.0: ring 0 stalled for more than 11280msec [ 46.787332] radeon :01:00.0: GPU lockup (current fence id 0x0001 last fence id 0x0002 on ring 0) [ 47.299336] radeon :01:00.0: ring 0 stalled for more than 11792msec [ 47.299346] radeon :01:00.0: GPU lockup (current fence id 0x0001 last fence id 0x0002 on ring 0) [ 47.811320] radeon :01:00.0: ring 0 stalled for more than 12304msec [ 47.811332] radeon :01:00.0: GPU lockup (current fence id 0x0001 last fence id 0x0002 on ring 0) [ 48.323331] radeon :01:00.0: ring 0 stalled for more than 12816msec [ 48.323344] radeon :01:00.0: GPU lockup (current fence id 0x0001 last fence id 0x0002 on ring 0) [ 48.835307] radeon :01:00.0: ring 0 stalled for more than 13328msec [ 48.835318] radeon :01:00.0: GPU lockup (current fence id 0x0001 last fence id 0x0002 on ring 0) [ 49.347328] radeon :01:00.0: ring 0 stalled for more than 13840msec [ 49.347341] radeon :01:00.0: GPU lockup (current fence id 0x0001 last fence id 0x0002 on ring 0) [ 49.859316] radeon :01:00.0: ring 0 stalled for more than 14352msec [ 49.859326] radeon :01:00.0: GPU lockup (current fence id 0x0001 last fence id 0x0002 on ring 0) [ 50.371471] radeon :01:00.0: ring 0 stalled for more than 14864msec [ 50.371483] radeon :01:00.0: GPU lockup (current fence id 0x0001 last fence id 0x0002 on ring 0) [ 50.883318] radeon :01:00.0: ring 0 stalled for more than 15376msec [ 50.883328] radeon :01:00.0: GPU lockup (current fence id 0x0001 last fence id 0x0002 on ring 0) [ 51.395315] radeon :01:00.0: ring 0 stalled for more than 15888msec [ 51.395327] radeon :01:00.0: GPU lockup (current fence id 0x0001 last fence id 0x0002 on ring 0) [ 51.907325] radeon :01:00.0: ring 0 stalled for more than 16400msec [ 51.907338] radeon :01:00.0: GPU lockup (current fence id 0x0001 last fence id 0x0002 on ring 0) [ 52.419319] radeon :01:00.0: ring 0 stalled for more than 16912msec [ 52.419330] radeon :01:00.0: GPU lockup (current fence id 0x0001 last fence id 0x0002 on ring 0) [ 52.931321] radeon :01:00.0: ring 0 stalled for more than 17424msec [ 52.931331] radeon :01:00.0: GPU lockup (current fence id 0x0001 last fence id 0x0002 on ring 0) [ 53.443321] radeon :01:00.0: ring 0 stalled for more than 17936msec [ 53.44] radeon :01:00.0: GPU lockup (current fence id 0x0001 last fence id 0x0002 on ring 0) [ 53.955335]
[Bug 1902981] Re: AGP GPU on PCI mode (when AGP is disabled at kernel build time) known to fail on K8 and K10 platforms
As a reminder, this is a dmesg captured when running ATI Radeon HD 4670 AGP on a K10 host on Linux 5.9 (vanilla). The ATI Radeon HD 4670 AGP (RV730 XT) is a very capable TeraScale GPU, supporting OpenGL 3.3 (Directx 10 on Windows) and OpenCL 1.0, and featured HDMI output and 1GB of VRAM. The host is also a very capable AMD Phenom II quad core CPU with 16GB of ram. To verify if its performances match 2020 expectations, I just engaged it (running Ubuntu 20.04) in 2020 Xonotic Defrag World Championship which is currently running (https://xdwc.teichisma.info/), and I got feedback from some players reporting this hardware may be better than their own hardware they compete with. In fact competitive games like Xonotic run at 144fps on 1920×1080 resolution. The last kernel able to drive this GPU on Ubuntu 20.04 LTS is the 5.4.0-47-generic one, the 5.4.0-48-generic one is believed to have backported the AGP disablement from 5.9-rc1 (ba806f9). So, when running on 5.4.0-48-generic kernel from Ubuntu repositories, or here, 5.9 vanilla compiled by myself, interesting parts from dmesg log may be: ``` [5.242322] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD) [5.242359] radeon :01:00.0: disabling GPU acceleration ``` and: ``` [ 34.558889] trying to bind memory to uninitialized GART ! [ 34.559048] WARNING: CPU: 1 PID: 2516 at drivers/gpu/drm/radeon/radeon_gart.c:299 radeon_gart_bind+0xdf/0xf0 [radeon] [ 34.559050] Modules linked in: zram snd_usb_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_usbmidi_lib snd_hda_core snd_hwdep snd_pcm snd_seq_midi kvm_amd snd_seq_midi_event ccp joydev kvm snd_seq snd_rawmidi input_leds snd_timer snd_seq_device snd soundcore k10temp mac_hid serio_raw binfmt_misc sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear uas usb_storage hid_generic usbhid hid radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm psmouse forcedeth i2c_nforce2 [ 34.559107] CPU: 1 PID: 2516 Comm: gnome-shell Not tainted 5.9.0 #1 [ 34.559109] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./AM2NF3-VSTA, BIOS P3.20 10/09/2009 [ 34.559178] RIP: 0010:radeon_gart_bind+0xdf/0xf0 [radeon] [ 34.559184] Code: 00 48 89 ef 48 8b 40 60 e8 0e 2f 44 df 31 c0 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c7 38 6f 6b c0 e8 23 0c 6d de <0f> 0b b8 ea ff ff ff eb dc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 [ 34.559187] RSP: 0018:c030838f7a28 EFLAGS: 00010282 [ 34.559191] RAX: RBX: a0cf6b88eb80 RCX: 0027 [ 34.559193] RDX: 0027 RSI: 0086 RDI: a0cf6fc98d08 [ 34.559196] RBP: c030838f7b28 R08: a0cf6fc98d00 R09: 0004 [ 34.559198] R10: R11: 0001 R12: c030838f7b28 [ 34.559201] R13: a0cf6a622868 R14: a0cf6c7cc6e8 R15: c030838f7b28 [ 34.559204] FS: 7f46ae245cc0() GS:a0cf6fc8() knlGS: [ 34.559207] CS: 0010 DS: ES: CR0: 80050033 [ 34.559210] CR2: 56494261c1c8 CR3: 00040bfe6000 CR4: 06e0 [ 34.559212] Call Trace: [ 34.559286] radeon_ttm_backend_bind+0x58/0x210 [radeon] [ 34.559305] ttm_tt_bind+0x32/0x60 [ttm] [ 34.559321] ttm_bo_handle_move_mem+0x236/0x590 [ttm] [ 34.559339] ttm_bo_validate+0x16c/0x180 [ttm] [ 34.559407] ? drm_ioctl_kernel+0xe9/0xf0 [drm] [ 34.559422] ttm_bo_init_reserved+0x2ae/0x320 [ttm] [ 34.559438] ttm_bo_init+0x6d/0xf0 [ttm] [ 34.559504] ? radeon_update_memory_usage.isra.0+0x50/0x50 [radeon] [ 34.559569] radeon_bo_create+0x184/0x210 [radeon] [ 34.559634] ? radeon_update_memory_usage.isra.0+0x50/0x50 [radeon] [ 34.559703] radeon_gem_object_create+0xa9/0x180 [radeon] [ 34.559773] ? radeon_gem_pwrite_ioctl+0x20/0x20 [radeon] [ 34.559840] radeon_gem_create_ioctl+0x66/0x120 [radeon] [ 34.559850] ? tomoyo_path_number_perm+0x66/0x1d0 [ 34.559918] ? radeon_gem_pwrite_ioctl+0x20/0x20 [radeon] [ 34.559968] drm_ioctl_kernel+0xaa/0xf0 [drm] [ 34.560021] drm_ioctl+0x1ec/0x390 [drm] [ 34.560090] ? radeon_gem_pwrite_ioctl+0x20/0x20 [radeon] [ 34.560152] radeon_drm_ioctl+0x49/0x80 [radeon] [ 34.560160] __x64_sys_ioctl+0x83/0xb0 [ 34.560167] do_syscall_64+0x33/0x80 [ 34.560174] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 34.560179] RIP: 0033:0x7f46b369550b [ 34.560183] Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48 [ 34.560186] RSP: 002b:7ffdb7421658 EFLAGS: 0246 ORIG_RAX: 0010 [ 34.560189] RAX: ffda RBX: 7ffdb74216d0 RCX: 7f46b369550b [