[Bug 1902981] Re: AGP GPU on PCI mode (when AGP is disabled at kernel build time) known to fail on K8 and K10 platforms

2020-11-04 Thread Thomas Debesse
On a side note, because we see a clear behaviour difference when
applying the PCI patch we can assume the driver catch the `rdev->flags &
RADEON_IS_PCI` test instead of the `rdev->flags & RADEON_IS_AGP` one
when running an AGP GPU with AGP disabled in kernel at build time.

** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

** Tags added: amd64 focal kernel-bug

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1902981

Title:
  AGP GPU on PCI mode (when AGP is disabled at kernel build time) known
  to fail on K8 and K10 platforms

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1902981/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1902981] Re: AGP GPU on PCI mode (when AGP is disabled at kernel build time) known to fail on K8 and K10 platforms

2020-11-04 Thread Thomas Debesse
It looks like comment #3 had been truncated, the interesting part of the
dmesg log that is missing is:

```

[   66.755306] radeon :01:00.0: ring 0 stalled for more than 31248msec
[   66.755317] radeon :01:00.0: GPU lockup (current fence id 
0x0001 last fence id 0x0002 on ring 0)
[   66.840372] radeon :01:00.0: Saved 25 dwords of commands on ring 0.
[   66.840402] radeon :01:00.0: GPU softreset: 0x0019
[   66.840408] radeon :01:00.0:   R_008010_GRBM_STATUS  = 0xA27034A1
[   66.840414] radeon :01:00.0:   R_008014_GRBM_STATUS2 = 0x0102
[   66.840419] radeon :01:00.0:   R_000E50_SRBM_STATUS  = 0x200028C0
[   66.840424] radeon :01:00.0:   R_008674_CP_STALLED_STAT1 = 0x0400
[   66.840429] radeon :01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010100
[   66.840434] radeon :01:00.0:   R_00867C_CP_BUSY_STAT = 0x8C80
[   66.840438] radeon :01:00.0:   R_008680_CP_STAT  = 0x808182E7
[   66.840443] radeon :01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   67.364934] radeon :01:00.0: Wait for MC idle timedout !
[   67.364940] radeon :01:00.0: R_008020_GRBM_SOFT_RESET=0x7F6B
[   67.365005] radeon :01:00.0: SRBM_SOFT_RESET=0x0100
[   67.367106] radeon :01:00.0:   R_008010_GRBM_STATUS  = 0x3028
[   67.367110] radeon :01:00.0:   R_008014_GRBM_STATUS2 = 0x0002
[   67.367114] radeon :01:00.0:   R_000E50_SRBM_STATUS  = 0x200028C0
[   67.367118] radeon :01:00.0:   R_008674_CP_STALLED_STAT1 = 0x
[   67.367122] radeon :01:00.0:   R_008678_CP_STALLED_STAT2 = 0x
[   67.367126] radeon :01:00.0:   R_00867C_CP_BUSY_STAT = 0x
[   67.367130] radeon :01:00.0:   R_008680_CP_STAT  = 0x
[   67.367134] radeon :01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[   67.367152] radeon :01:00.0: GPU reset succeeded, trying to resume
[   67.842179] radeon :01:00.0: Wait for MC idle timedout !
[   68.068765] radeon :01:00.0: Wait for MC idle timedout !
[   68.082273] [drm] PCIE GART of 1024M enabled (table at 0x0014C000).
[   68.082448] radeon :01:00.0: WB enabled
[   68.082454] radeon :01:00.0: fence driver on ring 0 use gpu addr 
0x4c00
[   68.082459] radeon :01:00.0: fence driver on ring 3 use gpu addr 
0x4c0c
[   68.088977] radeon :01:00.0: fence driver on ring 5 use gpu addr 
0x0005c598
[   68.374095] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed 
(scratch(0x8504)=0xCAFEDEAD)
[   68.374176] [drm:rv770_resume [radeon]] *ERROR* r600 startup failed on resume
```

This is what happens when applying the patch to force 32-bit DMA bit
mask on PCI devices.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1902981

Title:
  AGP GPU on PCI mode (when AGP is disabled at kernel build time) known
  to fail on K8 and K10 platforms

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1902981/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1902981] Re: AGP GPU on PCI mode (when AGP is disabled at kernel build time) known to fail on K8 and K10 platforms

2020-11-04 Thread Thomas Debesse
To get a better picture of such top-of-the-line AGP GPU performance,
when comparing to others GPUs on Unvanquished GPU compatibility matrix:
https://wiki.unvanquished.net/wiki/GPU_compatibility_matrix

we can see the ATI Radeon HD 4670 AGP (RV730 XT, TeraScale 1) performs:

- better than the PCI Express ATI Radeon HD 7450 from Q1 2012 (RV910, Caicos, 
TeraScale 2),
- like the mobile Nvidia GeForce GT 740M from Q2 2013 with nvidia driver (NVE7, 
GK107M, Kepler),
- like the mobile Quadro K1100M from Q3 2013 with nvidia driver (NVE7, 
GK107GLM, Kepler),
- like the integrated Intel HD 4600 from Q1 2014 (i7-4810MQ, Haswell, Gen7 GT2),
- like the integrated Intel HD 520 from Q3 2015 (i3-6100U, Skylake, Gen9 GT2),
- like the PCI Express GeForce GTX 1050 Ti from Q4 2016 when running the 
nouveau driver (Pascal).

On Nvidia side, to outperform this GPU on Linux with the free open
source nouveau driver it is required to acquire at least a GeForce GTX
1060 from 2016 (NV136, GP106-300-A1, Pascal).

Intel users may had to wait for the UHD 600 series (2016) to outperform
this ATI AGP GPU. To this day the first verified Intel GPU that is known
to outperform this ATI AGP GPU is the UHD 620 from Q3 2019.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1902981

Title:
  AGP GPU on PCI mode (when AGP is disabled at kernel build time) known
  to fail on K8 and K10 platforms

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1902981/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1902981] Re: AGP GPU on PCI mode (when AGP is disabled at kernel build time) known to fail on K8 and K10 platforms

2020-11-04 Thread Thomas Debesse
When applying patch from https://bugs.launchpad.net/bugs/1902795

-
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1902795/+attachment/5431335/+files/0001
-drm-radeon-make-all-PCI-GPUs-use-32bits-DMA-bit-mask.patch

which reduces the breakage (but not fix completely) the issues faced
with PCI GPUs on K8 and K10 hosts by setting DMA bit mask to 32-bits for
all PCI GPUs, we can see those this that is fixed on PCI GPUs is not
fixed on AGP-as-PCI GPUs (and there is even more errores before that):

```
[ 5.242322] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed 
(scratch(0x8504)=0xCAFEDEAD)
```

Things even go that wrong we even don't see those other errors that are
expected to be seen after that:

```
[ 5.242359] radeon :01:00.0: disabling GPU acceleration
```

```
 [ 34.558889] trying to bind memory to uninitialized GART !
```

Instead, the kernel loops before reaching those errors, trying
desperately to pass this r600_ring_test step.

But before r600_ring_test failure message is printed, more and newer
issues about ring 0 being stalled and GU lockup occurs with AGP-as-PCI
GPUs that are never seen with PCI-native GPUs, especially when taken in
account PCI GPUs can at least pass the r600_ring_test with the patch.

Also, after the r600_ring_test failure message, instead of getting the
message telling GPU acceleration is disabled, we get a message about
r600 startup failing on resume which is new.

This is why it is believed that fixing PCI GPUs may not be enough to fix
AGP GPUs running as PCI ones when AGP is disabled at kernel build time.

Here are the issues that is only seen with AGP-as-PCI GPUs, occurring
before and after the r600_ring_test failure message:

```
[   45.763336] radeon :01:00.0: ring 0 stalled for more than 10256msec
[   45.763349] radeon :01:00.0: GPU lockup (current fence id 
0x0001 last fence id 0x0002 on ring 0)
[   46.275324] radeon :01:00.0: ring 0 stalled for more than 10768msec
[   46.275335] radeon :01:00.0: GPU lockup (current fence id 
0x0001 last fence id 0x0002 on ring 0)
[   46.787322] radeon :01:00.0: ring 0 stalled for more than 11280msec
[   46.787332] radeon :01:00.0: GPU lockup (current fence id 
0x0001 last fence id 0x0002 on ring 0)
[   47.299336] radeon :01:00.0: ring 0 stalled for more than 11792msec
[   47.299346] radeon :01:00.0: GPU lockup (current fence id 
0x0001 last fence id 0x0002 on ring 0)
[   47.811320] radeon :01:00.0: ring 0 stalled for more than 12304msec
[   47.811332] radeon :01:00.0: GPU lockup (current fence id 
0x0001 last fence id 0x0002 on ring 0)
[   48.323331] radeon :01:00.0: ring 0 stalled for more than 12816msec
[   48.323344] radeon :01:00.0: GPU lockup (current fence id 
0x0001 last fence id 0x0002 on ring 0)
[   48.835307] radeon :01:00.0: ring 0 stalled for more than 13328msec
[   48.835318] radeon :01:00.0: GPU lockup (current fence id 
0x0001 last fence id 0x0002 on ring 0)
[   49.347328] radeon :01:00.0: ring 0 stalled for more than 13840msec
[   49.347341] radeon :01:00.0: GPU lockup (current fence id 
0x0001 last fence id 0x0002 on ring 0)
[   49.859316] radeon :01:00.0: ring 0 stalled for more than 14352msec
[   49.859326] radeon :01:00.0: GPU lockup (current fence id 
0x0001 last fence id 0x0002 on ring 0)
[   50.371471] radeon :01:00.0: ring 0 stalled for more than 14864msec
[   50.371483] radeon :01:00.0: GPU lockup (current fence id 
0x0001 last fence id 0x0002 on ring 0)
[   50.883318] radeon :01:00.0: ring 0 stalled for more than 15376msec
[   50.883328] radeon :01:00.0: GPU lockup (current fence id 
0x0001 last fence id 0x0002 on ring 0)
[   51.395315] radeon :01:00.0: ring 0 stalled for more than 15888msec
[   51.395327] radeon :01:00.0: GPU lockup (current fence id 
0x0001 last fence id 0x0002 on ring 0)
[   51.907325] radeon :01:00.0: ring 0 stalled for more than 16400msec
[   51.907338] radeon :01:00.0: GPU lockup (current fence id 
0x0001 last fence id 0x0002 on ring 0)
[   52.419319] radeon :01:00.0: ring 0 stalled for more than 16912msec
[   52.419330] radeon :01:00.0: GPU lockup (current fence id 
0x0001 last fence id 0x0002 on ring 0)
[   52.931321] radeon :01:00.0: ring 0 stalled for more than 17424msec
[   52.931331] radeon :01:00.0: GPU lockup (current fence id 
0x0001 last fence id 0x0002 on ring 0)
[   53.443321] radeon :01:00.0: ring 0 stalled for more than 17936msec
[   53.44] radeon :01:00.0: GPU lockup (current fence id 
0x0001 last fence id 0x0002 on ring 0)
[   53.955335] 

[Bug 1902981] Re: AGP GPU on PCI mode (when AGP is disabled at kernel build time) known to fail on K8 and K10 platforms

2020-11-04 Thread Thomas Debesse
As a reminder, this is a dmesg captured when running ATI Radeon HD 4670
AGP on a K10 host on Linux 5.9 (vanilla).

The ATI Radeon HD 4670 AGP (RV730 XT) is a very capable TeraScale GPU,
supporting OpenGL 3.3 (Directx 10 on Windows) and OpenCL 1.0, and
featured HDMI output and 1GB of VRAM. The host is also a very capable
AMD Phenom II quad core CPU with 16GB of ram.

To verify if its performances match 2020 expectations, I just engaged it
(running Ubuntu 20.04) in 2020 Xonotic Defrag World Championship which
is currently running (https://xdwc.teichisma.info/), and I got feedback
from some players reporting this hardware may be better than their own
hardware they compete with. In fact competitive games like Xonotic run
at 144fps on 1920×1080 resolution.

The last kernel able to drive this GPU on Ubuntu 20.04 LTS is the
5.4.0-47-generic one, the 5.4.0-48-generic one is believed to have
backported the AGP disablement from 5.9-rc1 (ba806f9).

So, when running on 5.4.0-48-generic kernel from Ubuntu repositories, or
here, 5.9 vanilla compiled by myself, interesting parts from dmesg log
may be:

```
[5.242322] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed 
(scratch(0x8504)=0xCAFEDEAD)
[5.242359] radeon :01:00.0: disabling GPU acceleration
```

and:

```
[   34.558889] trying to bind memory to uninitialized GART !
[   34.559048] WARNING: CPU: 1 PID: 2516 at 
drivers/gpu/drm/radeon/radeon_gart.c:299 radeon_gart_bind+0xdf/0xf0 [radeon]
[   34.559050] Modules linked in: zram snd_usb_audio snd_hda_intel 
snd_intel_dspcfg snd_hda_codec snd_usbmidi_lib snd_hda_core snd_hwdep snd_pcm 
snd_seq_midi kvm_amd snd_seq_midi_event ccp joydev kvm snd_seq snd_rawmidi 
input_leds snd_timer snd_seq_device snd soundcore k10temp mac_hid serio_raw 
binfmt_misc sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables btrfs 
blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy 
async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear 
uas usb_storage hid_generic usbhid hid radeon i2c_algo_bit drm_kms_helper 
syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm psmouse forcedeth 
i2c_nforce2
[   34.559107] CPU: 1 PID: 2516 Comm: gnome-shell Not tainted 5.9.0 #1
[   34.559109] Hardware name: To Be Filled By O.E.M. To Be Filled By 
O.E.M./AM2NF3-VSTA, BIOS P3.20 10/09/2009
[   34.559178] RIP: 0010:radeon_gart_bind+0xdf/0xf0 [radeon]
[   34.559184] Code: 00 48 89 ef 48 8b 40 60 e8 0e 2f 44 df 31 c0 48 83 c4 08 
5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c7 38 6f 6b c0 e8 23 0c 6d de <0f> 0b b8 
ea ff ff ff eb dc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[   34.559187] RSP: 0018:c030838f7a28 EFLAGS: 00010282
[   34.559191] RAX:  RBX: a0cf6b88eb80 RCX: 0027
[   34.559193] RDX: 0027 RSI: 0086 RDI: a0cf6fc98d08
[   34.559196] RBP: c030838f7b28 R08: a0cf6fc98d00 R09: 0004
[   34.559198] R10:  R11: 0001 R12: c030838f7b28
[   34.559201] R13: a0cf6a622868 R14: a0cf6c7cc6e8 R15: c030838f7b28
[   34.559204] FS:  7f46ae245cc0() GS:a0cf6fc8() 
knlGS:
[   34.559207] CS:  0010 DS:  ES:  CR0: 80050033
[   34.559210] CR2: 56494261c1c8 CR3: 00040bfe6000 CR4: 06e0
[   34.559212] Call Trace:
[   34.559286]  radeon_ttm_backend_bind+0x58/0x210 [radeon]
[   34.559305]  ttm_tt_bind+0x32/0x60 [ttm]
[   34.559321]  ttm_bo_handle_move_mem+0x236/0x590 [ttm]
[   34.559339]  ttm_bo_validate+0x16c/0x180 [ttm]
[   34.559407]  ? drm_ioctl_kernel+0xe9/0xf0 [drm]
[   34.559422]  ttm_bo_init_reserved+0x2ae/0x320 [ttm]
[   34.559438]  ttm_bo_init+0x6d/0xf0 [ttm]
[   34.559504]  ? radeon_update_memory_usage.isra.0+0x50/0x50 [radeon]
[   34.559569]  radeon_bo_create+0x184/0x210 [radeon]
[   34.559634]  ? radeon_update_memory_usage.isra.0+0x50/0x50 [radeon]
[   34.559703]  radeon_gem_object_create+0xa9/0x180 [radeon]
[   34.559773]  ? radeon_gem_pwrite_ioctl+0x20/0x20 [radeon]
[   34.559840]  radeon_gem_create_ioctl+0x66/0x120 [radeon]
[   34.559850]  ? tomoyo_path_number_perm+0x66/0x1d0
[   34.559918]  ? radeon_gem_pwrite_ioctl+0x20/0x20 [radeon]
[   34.559968]  drm_ioctl_kernel+0xaa/0xf0 [drm]
[   34.560021]  drm_ioctl+0x1ec/0x390 [drm]
[   34.560090]  ? radeon_gem_pwrite_ioctl+0x20/0x20 [radeon]
[   34.560152]  radeon_drm_ioctl+0x49/0x80 [radeon]
[   34.560160]  __x64_sys_ioctl+0x83/0xb0
[   34.560167]  do_syscall_64+0x33/0x80
[   34.560174]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   34.560179] RIP: 0033:0x7f46b369550b
[   34.560183] Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 
c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48
[   34.560186] RSP: 002b:7ffdb7421658 EFLAGS: 0246 ORIG_RAX: 
0010
[   34.560189] RAX: ffda RBX: 7ffdb74216d0 RCX: 7f46b369550b
[