Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-24 Thread Mikhail Gavrilov
On Thu, 21 Jan 2021 at 18:27, Christian König  wrote:
>
> I still have no idea what's going on here.
>
> The KASAN messages from the DC code are completely unrelated.
>
> Please add the full dmesg to your bug report.
>

I did it.
https://gitlab.freedesktop.org/drm/amd/-/issues/1439#note_776267

-- 
Best Regards,
Mike Gavrilov.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-21 Thread Christian König
 ]---

Issue with the switching off monitor still happens too, but messages
in logs become more detailed:
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -4!
amdgpu :0b:00.0: amdgpu: 000087613007 pin failed
[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin
framebuffer with error -12
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -4!
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -4!

I hope "[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the
buffer list -4!" gives an idea of what happened.

Full kernel log is here: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FnX69zgvfdata=04%7C01%7Cchristian.koenig%40amd.com%7Cdee77ab7d3c04b44adda08d8bcdebcfe%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637467012155850822%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=J6TiqMBHrrZyNolxaUgKo4%2BNa6kBCBytrs1bJhqzGuU%3Dreserved=0



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-19 Thread Mikhail Gavrilov
 in: amdgpu(+) drm_ttm_helper ttm iommu_v2 gpu_sched
drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm
ghash_clmulni_intel ccp igb nvme dca nvme_core i2c_algo_bit xhci_pci
xhci_pci_renesas wmi pinctrl_amd fuse
CPU: 25 PID: 500 Comm: systemd-udevd Tainted: GW
- ---  5.11.0-0.rc4.129.fc34.x86_64+debug #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
RIP: 0010:lockdep_init_map_waits+0x592/0x770
Code: 08 84 d2 0f 85 d8 01 00 00 8b 3d e1 02 38 04 85 ff 0f 85 7e fc
ff ff 48 c7 c6 e0 04 ca 8e 48 c7 c7 40 fd c9 8e e8 01 8e 23 02 <0f> 0b
e9 64 fc ff ff 48 89 df 44 89 4c 24 0c 44 89 44 24 08 48 89
RSP: 0018:c900029bef88 EFLAGS: 00010282
RAX:  RBX: 0003 RCX: 
RDX: 0027 RSI: 0004 RDI: f52000537de7
RBP:  R08: 0001 R09: 8886f9fe72ab
R10: ed10df3fce55 R11: 0001 R12: 88810b0d9148
R13:  R14: 8edbda60 R15: 88810b0db690
FS:  7f2c0fdda140() GS:8886f9e0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 55b8800aec68 CR3: 000127fd CR4: 00350ee0
Call Trace:
 ? lockdep_hardirqs_on+0x75/0xf0
 __kernfs_create_file+0x102/0x2f0
 sysfs_add_file_mode_ns+0x1af/0x500
 sysfs_create_bin_file+0x100/0x160
 ? lock_is_held_type+0xb8/0xf0
 ? sysfs_add_file_to_group+0x150/0x150
 ? static_obj+0x8a/0xc0
 ? lockdep_init_map_waits+0x2a2/0x770
 hdcp_create_workqueue+0x879/0xb50 [amdgpu]
 amdgpu_dm_init.isra.0.cold+0x7f2/0x374c [amdgpu]
 ? vprintk_emit+0x140/0x460
 ? dev_vprintk_emit+0x2d8/0x31a
 ? sched_clock+0x5/0x10
 ? dm_resume+0x13b0/0x13b0 [amdgpu]
 ? dev_attr_show.cold+0x35/0x35
 ? psp_set_srm+0x250/0x250 [amdgpu]
 ? hdcp_update_display+0x5b0/0x5b0 [amdgpu]
 ? lock_downgrade+0x6b0/0x6b0
 ? dev_printk_emit+0x8c/0xa8
 ? dev_vprintk_emit+0x31a/0x31a
 ? wait_for_completion_io+0x240/0x240
 ? __dev_printk+0x71/0xdf
 ? smu_hw_init.cold+0x16b/0x18a [amdgpu]
 ? smu_suspend+0x240/0x240 [amdgpu]
 ? navi10_ih_irq_init+0xea3/0x2420 [amdgpu]
 dm_hw_init+0xe/0x20 [amdgpu]
 amdgpu_device_init.cold+0x3031/0x4940 [amdgpu]
 ? amdgpu_device_cache_pci_state+0xf0/0xf0 [amdgpu]
 ? pci_bus_read_config_byte+0x140/0x140
 ? do_pci_enable_device+0x1f8/0x260
 ? pci_find_saved_ext_cap+0x110/0x110
 ? pci_enable_bridge+0xf9/0x1e0
 ? pci_dev_check_d3cold+0x107/0x250
 ? pci_enable_device_flags+0x201/0x340
 amdgpu_driver_load_kms+0x167/0x8a0 [amdgpu]
 amdgpu_pci_probe+0x235/0x360 [amdgpu]
 ? amdgpu_pci_remove+0xd0/0xd0 [amdgpu]
 local_pci_probe+0xd8/0x170
 pci_device_probe+0x318/0x5c0
 ? kernfs_create_link+0x16c/0x230
 ? pci_device_remove+0x1d0/0x1d0
 really_probe+0x224/0xc40
 driver_probe_device+0x1f2/0x380
 device_driver_attach+0x1df/0x250
 __driver_attach+0xf6/0x260
 ? device_driver_attach+0x250/0x250
 bus_for_each_dev+0x114/0x180
 ? subsys_dev_iter_exit+0x10/0x10
 bus_add_driver+0x352/0x570
 driver_register+0x20f/0x390
 ? __pci_register_driver+0x13a/0x210
 ? 0xc1d8d000
 do_one_initcall+0xfb/0x530
 ? perf_trace_initcall_level+0x3d0/0x3d0
 ? __memset+0x2b/0x30
 ? unpoison_range+0x3a/0x60
 do_init_module+0x1ce/0x7a0
 load_module+0x9841/0xa380
 ? module_frob_arch_sections+0x20/0x20
 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
 ? sched_clock_cpu+0x18/0x170
 ? sched_clock+0x5/0x10
 ? lock_acquire+0x2dd/0x7a0
 ? sched_clock+0x5/0x10
 ? lock_is_held_type+0xb8/0xf0
 ? __do_sys_init_module+0x18b/0x220
 __do_sys_init_module+0x18b/0x220
 ? load_module+0xa380/0xa380
 ? ktime_get_coarse_real_ts64+0x12f/0x160
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f2c109da07e
Code: 48 8b 0d f5 1d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f
84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d c2 1d 0c 00 f7 d8 64 89 01 48
RSP: 002b:7ffc84d33f88 EFLAGS: 0246 ORIG_RAX: 00af
RAX: ffda RBX: 55b87f8260a0 RCX: 7f2c109da07e
RDX: 55b87f834060 RSI: 01e2cbf6 RDI: 7f2c0b7e0010
RBP: 7f2c0b7e0010 R08: 55b87f8281e0 R09: 7ffc84d30a26
R10: 55bd2404cc18 R11: 0246 R12: 55b87f834060
R13: 55b87f831ca0 R14:  R15: 55b87f832640
irq event stamp: 593331
hardirqs last  enabled at (593331): []
console_unlock+0x7c0/0x9a0
hardirqs last disabled at (593330): []
console_unlock+0x6b8/0x9a0
softirqs last  enabled at (593162): []
asm_call_irq_on_stack+0x12/0x20
softirqs last disabled at (593157): []
asm_call_irq_on_stack+0x12/0x20
---[ end trace 37dc3a4a3aa1704a ]---

Issue with the switching off monitor still happens too, but messages
in logs become more detailed:
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -4!
amdgpu :0b:00.0: amdgpu: 000087613007 pin failed
[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin
framebuffer with error -12
[drm:amdgpu_cs_ioctl [amdgpu]] *E

Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-14 Thread Mikhail Gavrilov
On Thu, 14 Jan 2021 at 18:56, Christian König  wrote:
> Unfortunately not of hand.
>
> I also don't see any bug reports from other people and can't reproduce
> the last backtrace you send out TTM here.

Because only the most desperate will install kernels with enabled
debug flags and then load the system by opening a huge number of
programs and tabs. So you shouldn't be surprised that I'm the only one
here.
This is what my desktop looks like every day: https://imgur.com/a/Kxlmrem

> Do you have any local modifications or special setup in your system?
> Like bpf scripts or something like that?

No, my I didn't write any bpf scripts, but looks like my distribution
Fedora Rawhide uses some bpf scripts by default out of box:

# bpftool prog
20: cgroup_device  tag 40ddf486530245f5  gpl
loaded_at 2021-01-15T01:30:04+0500  uid 0
xlated 504B  jited 309B  memlock 4096B
21: cgroup_skb  tag 6deef7357e7b4530  gpl
loaded_at 2021-01-15T01:30:04+0500  uid 0
xlated 64B  jited 54B  memlock 4096B
22: cgroup_skb  tag 6deef7357e7b4530  gpl
loaded_at 2021-01-15T01:30:04+0500  uid 0
xlated 64B  jited 54B  memlock 4096B
23: cgroup_device  tag ca8e50a3c7fb034b  gpl
loaded_at 2021-01-15T01:30:05+0500  uid 0
xlated 496B  jited 307B  memlock 4096B
24: cgroup_skb  tag 6deef7357e7b4530  gpl
loaded_at 2021-01-15T01:30:05+0500  uid 0
xlated 64B  jited 54B  memlock 4096B
25: cgroup_skb  tag 6deef7357e7b4530  gpl
loaded_at 2021-01-15T01:30:05+0500  uid 0
xlated 64B  jited 54B  memlock 4096B
26: cgroup_device  tag be31ae23198a0378  gpl
loaded_at 2021-01-15T01:30:13+0500  uid 0
xlated 464B  jited 288B  memlock 4096B
27: cgroup_device  tag ee0e253c78993a24  gpl
loaded_at 2021-01-15T01:30:13+0500  uid 0
xlated 416B  jited 255B  memlock 4096B
28: cgroup_device  tag 438c5618576e5b0c  gpl
loaded_at 2021-01-15T01:30:13+0500  uid 0
xlated 568B  jited 354B  memlock 4096B
29: cgroup_skb  tag 6deef7357e7b4530  gpl
loaded_at 2021-01-15T01:30:13+0500  uid 0
xlated 64B  jited 54B  memlock 4096B
30: cgroup_skb  tag 6deef7357e7b4530  gpl
loaded_at 2021-01-15T01:30:13+0500  uid 0
xlated 64B  jited 54B  memlock 4096B
31: cgroup_skb  tag 6deef7357e7b4530  gpl
loaded_at 2021-01-15T01:30:13+0500  uid 0
xlated 64B  jited 54B  memlock 4096B
32: cgroup_skb  tag 6deef7357e7b4530  gpl
loaded_at 2021-01-15T01:30:13+0500  uid 0
xlated 64B  jited 54B  memlock 4096B
33: cgroup_skb  tag 6deef7357e7b4530  gpl
loaded_at 2021-01-15T01:30:14+0500  uid 0
xlated 64B  jited 54B  memlock 4096B
34: cgroup_skb  tag 6deef7357e7b4530  gpl
loaded_at 2021-01-15T01:30:14+0500  uid 0
xlated 64B  jited 54B  memlock 4096B
35: cgroup_device  tag ee0e253c78993a24  gpl
loaded_at 2021-01-15T01:30:14+0500  uid 0
xlated 416B  jited 255B  memlock 4096B
38: cgroup_device  tag 3a0ef5414c2f6fca  gpl
loaded_at 2021-01-15T01:30:14+0500  uid 0
xlated 744B  jited 447B  memlock 4096B
39: cgroup_skb  tag 6deef7357e7b4530  gpl
loaded_at 2021-01-15T01:30:14+0500  uid 0
xlated 64B  jited 54B  memlock 4096B
40: cgroup_skb  tag 6deef7357e7b4530  gpl
loaded_at 2021-01-15T01:30:14+0500  uid 0
xlated 64B  jited 54B  memlock 4096B
41: cgroup_device  tag ee0e253c78993a24  gpl
loaded_at 2021-01-15T01:30:18+0500  uid 0
xlated 416B  jited 255B  memlock 4096B
42: cgroup_skb  tag 6deef7357e7b4530  gpl
loaded_at 2021-01-15T01:30:18+0500  uid 0
xlated 64B  jited 54B  memlock 4096B
43: cgroup_skb  tag 6deef7357e7b4530  gpl
loaded_at 2021-01-15T01:30:18+0500  uid 0
xlated 64B  jited 54B  memlock 4096B

I catched yet another couples of leaks , but nothing new:
https://pastebin.com/2EgvYJdz

[1] do_detailed_mode+0x7c1/0x13d0 [drm]
[2] drm_mode_duplicate+0x45/0x220 [drm]
[3] do_seccomp+0x215/0x2280
[4] __vmalloc_node_range+0x464/0x7b0
[5] bpf_prog_alloc_no_stats+0xa2/0x2b0
[6] bpf_prog_store_orig_filter+0x7b/0x1c0
[7] kmemdup+0x1a/0x40

Did the following trace message confuse anyone?
==
BUG: KASAN: slab-out-of-bounds in
kfd_create_crat_image_virtual+0x12d2/0x1380 [amdgpu]
Read of size 1 at addr 88812a6b4181 by task systemd-udevd/491

CPU: 20 PID: 491 Comm: systemd-udevd Not tainted
5.11.0-0.rc3.20210114git65f0d2414b70.125.fc34.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
Call Trace:
 dump_stack+0xae/0xe5
 print_address_description.constprop.0+0x18/0x160
 ? kfd_create_crat_image_virtual+0x12d2/0x1380 [amdgpu]
 kasan_report.cold+0x7f/0x10e
 ? kfd_create_crat_image_virtual+0x12d2/0x1380 [amdgpu]
 kfd_create_crat_image_virtual+0x12d2/0x1380 [amdgpu]
 ? kfd_create_crat_image_acpi+0x340/0x340 [amdgpu]
 ? __raw_spin_lock_init+0x39/0x110
 kfd_topology_init+0x2ac/0x400 [amdgpu]
 ? kfd_create_topology_device+0x320/0x320 [amdgpu]
 ? __class_register+0x2ad/0x430
 ? __class_create+0xc5/0x130
 kgd2kfd_init+0x95/0xf0 [amdgpu]
 

Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-14 Thread Daniel Vetter
On Thu, Jan 14, 2021 at 2:56 PM Christian König
 wrote:
>
> Am 14.01.21 um 01:22 schrieb Mikhail Gavrilov:
> > On Tue, 12 Jan 2021 at 01:45, Christian König  
> > wrote:
> >> But what you have in your logs so far are only unrelated symptoms, the
> >> root of the problem is that somebody is leaking memory.
> >>
> >> What you could do as well is to try to enable kmemleak
> > I captured some memleaks.
> > Do they contain any useful information?
>
> Unfortunately not of hand.
>
> I also don't see any bug reports from other people and can't reproduce
> the last backtrace you send out TTM here.
>
> Do you have any local modifications or special setup in your system?
> Like bpf scripts or something like that?

There's another bug report (for rcar-du, bisected to the a switch to
use more cma helpers) about leaking mmaps, which keeps too many fb
alive, so maybe we have gained a refcount leak somewhere recently. But
could also be totally unrelated.
-Daniel



>
> Christian.
>
> >
> > [1] https://pastebin.com/n0FE7Hsu
> > [2] https://pastebin.com/MUX55L1k
> > [3] https://pastebin.com/a3FT7DVG
> > [4] https://pastebin.com/1ALvJKz7
> >
> > --
> > Best Regards,
> > Mike Gavrilov.
> > ___
> > amd-gfx mailing list
> > amd-...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-14 Thread Christian König

Am 14.01.21 um 01:22 schrieb Mikhail Gavrilov:

On Tue, 12 Jan 2021 at 01:45, Christian König  wrote:

But what you have in your logs so far are only unrelated symptoms, the
root of the problem is that somebody is leaking memory.

What you could do as well is to try to enable kmemleak

I captured some memleaks.
Do they contain any useful information?


Unfortunately not of hand.

I also don't see any bug reports from other people and can't reproduce 
the last backtrace you send out TTM here.


Do you have any local modifications or special setup in your system? 
Like bpf scripts or something like that?


Christian.



[1] https://pastebin.com/n0FE7Hsu
[2] https://pastebin.com/MUX55L1k
[3] https://pastebin.com/a3FT7DVG
[4] https://pastebin.com/1ALvJKz7

--
Best Regards,
Mike Gavrilov.
___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-13 Thread Mikhail Gavrilov
On Tue, 12 Jan 2021 at 01:45, Christian König  wrote:
>
> But what you have in your logs so far are only unrelated symptoms, the
> root of the problem is that somebody is leaking memory.
>
> What you could do as well is to try to enable kmemleak

I captured some memleaks.
Do they contain any useful information?

[1] https://pastebin.com/n0FE7Hsu
[2] https://pastebin.com/MUX55L1k
[3] https://pastebin.com/a3FT7DVG
[4] https://pastebin.com/1ALvJKz7

--
Best Regards,
Mike Gavrilov.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-11 Thread Mikhail Gavrilov
Hi Christian,

On Tue, 12 Jan 2021 at 01:45, Christian König  wrote:
>
> Hi Mike,
>
> Unfortunately not, that's DC stuff. Easiest is to assign this as a bug
> tracker to our DC team.
Ok

> At least some progress. Any objections that I add your e-mail address as
> tested-by tag?
Yes, feel free add me.

> I can take a look at this one here. Looks like some missing error
> handling when allocating memory.
> Can you decode to which line number ttm_tt_swapin+0x34 points to?
$ /usr/src/kernels/`uname -r`/scripts/faddr2line
/lib/debug/lib/modules/`uname
-r`/kernel/drivers/gpu/drm/ttm/ttm.ko.debug ttm_tt_swapin+0x34
ttm_tt_swapin+0x34/0xd0:
mapping_gfp_mask at
/usr/src/debug/kernel-20210108gitf5e6c330254a/linux-5.11.0-0.rc2.20210108gitf5e6c330254a.120.fc34.x86_64/./include/linux/pagemap.h:105
(discriminator 2)
(inlined by) ttm_tt_swapin at
/usr/src/debug/kernel-20210108gitf5e6c330254a/linux-5.11.0-0.rc2.20210108gitf5e6c330254a.120.fc34.x86_64/drivers/gpu/drm/ttm/ttm_tt.c:210
(discriminator 2)

$ cat -s -n 
/usr/src/debug/kernel-20210108gitf5e6c330254a/linux-5.11.0-0.rc2.20210108gitf5e6c330254a.120.fc34.x86_64/drivers/gpu/drm/ttm/ttm_tt.c
| head -220 | tail -20
   201  struct page *from_page;
   202  struct page *to_page;
   203  gfp_t gfp_mask;
   204  int i, ret;
   205
   206  swap_storage = ttm->swap_storage;
   207  BUG_ON(swap_storage == NULL);
   208
   209  swap_space = swap_storage->f_mapping;
   210  gfp_mask = mapping_gfp_mask(swap_space);
   211
   212  for (i = 0; i < ttm->num_pages; ++i) {
   213  from_page = shmem_read_mapping_page_gfp(swap_space, i,
   214  gfp_mask);
   215  if (IS_ERR(from_page)) {
   216  ret = PTR_ERR(from_page);
   217  goto out_err;
   218  }
   219  to_page = ttm->pages[i];
   220  if (unlikely(to_page == NULL)) {

> Please use this one here:
> https://gitlab.freedesktop.org/drm/amd/-/issues/new
>
> If you can't find the DC guys of hand in the assignee list just assign
> to me and I will forward.
https://gitlab.freedesktop.org/drm/amd/-/issues/1439
Ok, let's continue there.

--
Best Regards,
Mike Gavrilov.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-11 Thread Christian König

Hi Mike,

Am 11.01.21 um 20:23 schrieb Mikhail Gavrilov:

On Mon, 11 Jan 2021 at 19:01, Christian König  wrote:


Changing the page table attributes while releasing memory might sleep.
So we can't use a spinlock here.

Thanks for the report, a patch to fix this is on the mailing list now.

Can you look also the first trace?


Unfortunately not, that's DC stuff. Easiest is to assign this as a bug 
tracker to our DC team.



Here a same error message "sleeping function called from invalid
context" and a lot of [amdgpu] code.


[SNIP]


-12 is just -ENOMEM. Looks like a memory leak to me, maybe caused by
the problem above, maybe something completely unrelated.

I will take a look.

The looks like a completely unrelated memory leak to me.

Probably best if you open up a bug report for this.

Yes, the monitor still turns off after applying patch "make the pool
shrinker lock a mutex".
Anyway patch fixed the issue with flood of message "BUG: sleeping
function called from invalid context at mm/vmalloc.c:1756" so kernel
log became cleaner.


At least some progress. Any objections that I add your e-mail address as 
tested-by tag?



Now the issue with turns off monitor looks in logs so:

DMA-API: cacheline tracking ENOMEM, dma-debug disabled
amdgpu :0b:00.0: amdgpu: 6b791523 pin failed
[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin
framebuffer with error -12
BUG: kernel NULL pointer dereference, address: 0060
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD 0 P4D 0
Oops:  [#1] SMP NOPTI
CPU: 20 PID: 3780 Comm: brave:cs0 Tainted: GW-
---  5.11.0-0.rc2.20210108gitf5e6c330254a.120.fc34.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
RIP: 0010:ttm_tt_swapin+0x34/0x1b0 [ttm]
Code: 55 41 54 55 53 48 83 ec 10 48 8b 47 20 48 89 44 24 08 48 85 c0
0f 84 86 01 00 00 48 8b 44 24 08 49 89 fc 4c 8b a8 e0 01 00 00 <41> 8b
45 60 89 44 24 04 8b 47 0c 85 c0 0f 84 df 00 00 00 31 db 65
RSP: 0018:a7400532b9c0 EFLAGS: 00010286
RAX: 978e2ae25800 RBX: 97910ec12058 RCX: 978e12caac70
RDX: 8010 RSI:  RDI: 97912c3d99c0
RBP: 97912c3d99c0 R08:  R09: 70b3a000
R10: 0002 R11:  R12: 97912c3d99c0
R13:  R14: a7400532ba90 R15: 978e182c6350
FS:  7f070bb1b640() GS:97950920() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0060 CR3: 0001f0cd2000 CR4: 00350ee0
Call Trace:
  ttm_tt_populate+0xa9/0xe0 [ttm]
  ttm_bo_handle_move_mem+0x142/0x180 [ttm]
  ttm_bo_validate+0x12e/0x1c0 [ttm]


I can take a look at this one here. Looks like some missing error 
handling when allocating memory.


Can you decode to which line number ttm_tt_swapin+0x34 points to?

[SNIP]


You said that I need open up a bug report you means site
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fdata=04%7C01%7Cchristian.koenig%40amd.com%7C75040f5053404b0f302b08d8b666769b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459898491581880%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=IbkSfHK%2BD13OCcYMg%2BlNsZixi9gDEQEfS7Mxyf7vGdM%3Dreserved=0
 ?
I thought mailing lists is better because bug report on
bugzilla.kernel.org usually leave opened for several years without
attention.


Please use this one here: 
https://gitlab.freedesktop.org/drm/amd/-/issues/new


If you can't find the DC guys of hand in the assignee list just assign 
to me and I will forward.


But what you have in your logs so far are only unrelated symptoms, the 
root of the problem is that somebody is leaking memory.


What you could do as well is to try to enable kmemleak and maybe try 
some bleeding edge branch like drm-misc-fixes or Alex 
amd-staging-drm-next branch.


Thanks for the help,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-11 Thread Mikhail Gavrilov
On Mon, 11 Jan 2021 at 19:01, Christian König  wrote:

> Changing the page table attributes while releasing memory might sleep.
> So we can't use a spinlock here.
>
> Thanks for the report, a patch to fix this is on the mailing list now.

Can you look also the first trace?
Here a same error message "sleeping function called from invalid
context" and a lot of [amdgpu] code.

BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:196
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 501, name: systemd-udevd
1 lock held by systemd-udevd/501:
 #0: 978e0278d258 (>mutex){}-{3:3}, at:
device_driver_attach+0x3b/0xb0
CPU: 25 PID: 501 Comm: systemd-udevd Not tainted
5.11.0-0.rc2.20210108gitf5e6c330254a.120.fc34.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
Call Trace:
 dump_stack+0x8b/0xb0
 ___might_sleep.cold+0xb6/0xc6
 ? dcn30_clock_source_create+0x34/0xb0 [amdgpu]
 kmem_cache_alloc_trace+0x204/0x230
 dcn30_clock_source_create+0x34/0xb0 [amdgpu]
 dcn30_create_resource_pool+0x1d9/0x13a0 [amdgpu]
 ? rcu_read_lock_sched_held+0x3f/0x80
 ? trace_kmalloc+0xb2/0xe0
 ? __kmalloc+0x191/0x280
 ? dc_create_resource_pool+0x110/0x1d0 [amdgpu]
 dc_create_resource_pool+0x110/0x1d0 [amdgpu]
 dc_create+0x205/0x790 [amdgpu]
 ? trace_kmalloc+0xb2/0xe0
 ? kmem_cache_alloc_trace+0x174/0x230
 amdgpu_dm_init.isra.0+0x1b9/0x250 [amdgpu]
 ? dev_vprintk_emit+0x171/0x195
 ? dev_printk_emit+0x3e/0x40
 dm_hw_init+0xe/0x20 [amdgpu]
 amdgpu_device_init.cold+0x179f/0x1afd [amdgpu]
 ? pci_conf1_read+0xa4/0x100
 amdgpu_driver_load_kms+0x68/0x280 [amdgpu]
 amdgpu_pci_probe+0x129/0x1b0 [amdgpu]
 local_pci_probe+0x42/0x80
 pci_device_probe+0xd9/0x1a0
 really_probe+0x205/0x460
 driver_probe_device+0xe1/0x150
 device_driver_attach+0xa8/0xb0
 __driver_attach+0x8c/0x150
 ? device_driver_attach+0xb0/0xb0
 ? device_driver_attach+0xb0/0xb0
 bus_for_each_dev+0x67/0x90
 bus_add_driver+0x12e/0x1f0
 driver_register+0x8f/0xe0
 ? 0xc0d9c000
 do_one_initcall+0x67/0x320
 ? rcu_read_lock_sched_held+0x3f/0x80
 ? trace_kmalloc+0xb2/0xe0
 ? kmem_cache_alloc_trace+0x174/0x230
 do_init_module+0x5c/0x270
 __do_sys_init_module+0x130/0x190
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f363661deee
Code: 48 8b 0d 85 1f 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f
84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d 52 1f 0c 00 f7 d8 64 89 01 48
RSP: 002b:7ffeb7191588 EFLAGS: 0246 ORIG_RAX: 00af
RAX: ffda RBX: 561b94563170 RCX: 7f363661deee
RDX: 561b94579df0 RSI: 00b8a356 RDI: 7f3633b9e010
RBP: 7f3633b9e010 R08: 561b94565240 R09: 7ffeb718d786
R10: 561ef5ef1595 R11: 0246 R12: 561b94579df0
R13: 561b9457a3e0 R14:  R15: 561b94576530
[drm] Display Core initialized with v3.2.116!
[drm] DMUB hardware initialized: version=0x0201
usb 1-3.2: new high-speed USB device number 5 using xhci_hcd
[drm] REG_WAIT timeout 1us * 10 tries - mpc2_assert_idle_mpcc line:480

> > -12 is just -ENOMEM. Looks like a memory leak to me, maybe caused by
> > the problem above, maybe something completely unrelated.
> >
> > I will take a look.
>
> The looks like a completely unrelated memory leak to me.
>
> Probably best if you open up a bug report for this.

Yes, the monitor still turns off after applying patch "make the pool
shrinker lock a mutex".
Anyway patch fixed the issue with flood of message "BUG: sleeping
function called from invalid context at mm/vmalloc.c:1756" so kernel
log became cleaner.
Now the issue with turns off monitor looks in logs so:

DMA-API: cacheline tracking ENOMEM, dma-debug disabled
amdgpu 0000:0b:00.0: amdgpu: 000000006b791523 pin failed
[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin
framebuffer with error -12
BUG: kernel NULL pointer dereference, address: 0060
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD 0 P4D 0
Oops:  [#1] SMP NOPTI
CPU: 20 PID: 3780 Comm: brave:cs0 Tainted: GW-
---  5.11.0-0.rc2.20210108gitf5e6c330254a.120.fc34.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
RIP: 0010:ttm_tt_swapin+0x34/0x1b0 [ttm]
Code: 55 41 54 55 53 48 83 ec 10 48 8b 47 20 48 89 44 24 08 48 85 c0
0f 84 86 01 00 00 48 8b 44 24 08 49 89 fc 4c 8b a8 e0 01 00 00 <41> 8b
45 60 89 44 24 04 8b 47 0c 85 c0 0f 84 df 00 00 00 31 db 65
RSP: 0018:a7400532b9c0 EFLAGS: 00010286
RAX: 978e2ae25800 RBX: 97910ec12058 RCX: 978e12caac70
RDX: 8010 RSI:  RDI: 97912c3d99c0
RBP: 97912c3d99c0 R08:  R09: 70b3a000
R10: 0002 R11:  R12: 97912c3d99c0

Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-11 Thread Christian König

Am 11.01.21 um 10:03 schrieb Christian König:

Hi Mikhail

Am 10.01.21 um 23:26 schrieb Mikhail Gavrilov:

Hi folks,
today I joined to testing Kernel 5.11 and saw that the kernel log was
flooded with BUG messages:
BUG: sleeping function called from invalid context at mm/vmalloc.c:1756
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 266, name: 
kswapd0

INFO: lockdep is turned off.
CPU: 15 PID: 266 Comm: kswapd0 Tainted: G    W -
---  5.11.0-0.rc2.20210108gitf5e6c330254a.119.fc34.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
Call Trace:
  dump_stack+0x8b/0xb0
  ___might_sleep.cold+0xb6/0xc6
  vm_unmap_aliases+0x21/0x40
  change_page_attr_set_clr+0x9e/0x190
  set_memory_wb+0x2f/0x80
  ttm_pool_free_page+0x28/0x90 [ttm]
  ttm_pool_shrink+0x45/0xb0 [ttm]
  ttm_pool_shrinker_scan+0xa/0x20 [ttm]
  do_shrink_slab+0x177/0x3a0
  shrink_slab+0x9c/0x290
  shrink_node+0x2e6/0x700
  balance_pgdat+0x2f5/0x650
  kswapd+0x21d/0x4d0
  ? do_wait_intr_irq+0xd0/0xd0
  ? balance_pgdat+0x650/0x650
  kthread+0x13a/0x150
  ? __kthread_bind_mask+0x60/0x60
  ret_from_fork+0x22/0x30


I'm probably responsible for this. Need to double check why we try to 
allocate memory while freeing some.


Changing the page table attributes while releasing memory might sleep. 
So we can't use a spinlock here.


Thanks for the report, a patch to fix this is on the mailing list now.


But the most unpleasant thing is that after a while the monitor turns
off and does not go on again until the restart.
This is accompanied by an entry in the kernel log:

amdgpu :0b:00.0: amdgpu: ff7d8b94 pin failed
[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin
framebuffer with error -12


-12 is just -ENOMEM. Looks like a memory leak to me, maybe caused by 
the problem above, maybe something completely unrelated.


I will take a look.


The looks like a completely unrelated memory leak to me.

Probably best if you open up a bug report for this.

Thanks,
Christian.



Thanks,
Christian.



$ grep "Failed to pin framebuffer with error" -Rn .
./drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:5816:
DRM_ERROR("Failed to pin framebuffer with error %d\n", r);

$ git blame -L 5811,5821 
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c

Blaming lines:   0% (11/9167), done.
5d43be0ccbc2f (Christian König 2017-10-26 18:06:23 +0200 5811)
  domain = AMDGPU_GEM_DOMAIN_VRAM;
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5812)
7b7c6c81b3a37 (Junwei Zhang    2018-06-25 12:51:14 +0800 5813) r =
amdgpu_bo_pin(rbo, domain);
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5814) if
(unlikely(r != 0)) {
30b7c6147d18d (Harry Wentland  2017-10-26 15:35:14 -0400 5815)
  if (r != -ERESTARTSYS)
30b7c6147d18d (Harry Wentland  2017-10-26 15:35:14 -0400 5816)
  DRM_ERROR("Failed to pin framebuffer with error %d\n", r);
0f257b09531b4 (Chunming Zhou   2019-05-07 19:45:31 +0800 5817)
  ttm_eu_backoff_reservation(, );
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5818)
  return r;
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5819) }
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5820)
bb812f1ea87dd (Junwei Zhang    2018-06-25 13:32:24 +0800 5821) r =
amdgpu_ttm_alloc_gart(>tbo);

Who knows how to fix it?

Full kernel logs is here:
[1] 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FfLasjDHXdata=04%7C01%7Cchristian.koenig%40amd.com%7C15ef83e462e049429be208d8b5b6c6bb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459143942981908%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=Uj9Ob3lUCAsH8NrxC715zSfl5Yqc44ySVo%2FZkdyTpCM%3Dreserved=0
[2] 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fg3wR2r9edata=04%7C01%7Cchristian.koenig%40amd.com%7C15ef83e462e049429be208d8b5b6c6bb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459143942981908%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=u8irMU3i8c37W5SkyiaAi%2FtwMoPorezm3NI1EYI3csE%3Dreserved=0


--
Best Regards,
Mike Gavrilov.




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-11 Thread Christian König

Hi Mikhail

Am 10.01.21 um 23:26 schrieb Mikhail Gavrilov:

Hi folks,
today I joined to testing Kernel 5.11 and saw that the kernel log was
flooded with BUG messages:
BUG: sleeping function called from invalid context at mm/vmalloc.c:1756
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 266, name: kswapd0
INFO: lockdep is turned off.
CPU: 15 PID: 266 Comm: kswapd0 Tainted: GW-
---  5.11.0-0.rc2.20210108gitf5e6c330254a.119.fc34.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
Call Trace:
  dump_stack+0x8b/0xb0
  ___might_sleep.cold+0xb6/0xc6
  vm_unmap_aliases+0x21/0x40
  change_page_attr_set_clr+0x9e/0x190
  set_memory_wb+0x2f/0x80
  ttm_pool_free_page+0x28/0x90 [ttm]
  ttm_pool_shrink+0x45/0xb0 [ttm]
  ttm_pool_shrinker_scan+0xa/0x20 [ttm]
  do_shrink_slab+0x177/0x3a0
  shrink_slab+0x9c/0x290
  shrink_node+0x2e6/0x700
  balance_pgdat+0x2f5/0x650
  kswapd+0x21d/0x4d0
  ? do_wait_intr_irq+0xd0/0xd0
  ? balance_pgdat+0x650/0x650
  kthread+0x13a/0x150
  ? __kthread_bind_mask+0x60/0x60
  ret_from_fork+0x22/0x30


I'm probably responsible for this. Need to double check why we try to 
allocate memory while freeing some.



But the most unpleasant thing is that after a while the monitor turns
off and does not go on again until the restart.
This is accompanied by an entry in the kernel log:

amdgpu :0b:00.0: amdgpu: ff7d8b94 pin failed
[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin
framebuffer with error -12


-12 is just -ENOMEM. Looks like a memory leak to me, maybe caused by the 
problem above, maybe something completely unrelated.


I will take a look.

Thanks,
Christian.



$ grep "Failed to pin framebuffer with error" -Rn .
./drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:5816:
DRM_ERROR("Failed to pin framebuffer with error %d\n", r);

$ git blame -L 5811,5821 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
Blaming lines:   0% (11/9167), done.
5d43be0ccbc2f (Christian König 2017-10-26 18:06:23 +0200 5811)
  domain = AMDGPU_GEM_DOMAIN_VRAM;
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5812)
7b7c6c81b3a37 (Junwei Zhang2018-06-25 12:51:14 +0800 5813)  r =
amdgpu_bo_pin(rbo, domain);
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5814)  if
(unlikely(r != 0)) {
30b7c6147d18d (Harry Wentland  2017-10-26 15:35:14 -0400 5815)
  if (r != -ERESTARTSYS)
30b7c6147d18d (Harry Wentland  2017-10-26 15:35:14 -0400 5816)
  DRM_ERROR("Failed to pin framebuffer with error %d\n", r);
0f257b09531b4 (Chunming Zhou   2019-05-07 19:45:31 +0800 5817)
  ttm_eu_backoff_reservation(, );
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5818)
  return r;
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5819)  }
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5820)
bb812f1ea87dd (Junwei Zhang2018-06-25 13:32:24 +0800 5821)  r =
amdgpu_ttm_alloc_gart(>tbo);

Who knows how to fix it?

Full kernel logs is here:
[1] 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FfLasjDHXdata=04%7C01%7Cchristian.koenig%40amd.com%7C15ef83e462e049429be208d8b5b6c6bb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459143942981908%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=Uj9Ob3lUCAsH8NrxC715zSfl5Yqc44ySVo%2FZkdyTpCM%3Dreserved=0
[2] 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fg3wR2r9edata=04%7C01%7Cchristian.koenig%40amd.com%7C15ef83e462e049429be208d8b5b6c6bb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459143942981908%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=u8irMU3i8c37W5SkyiaAi%2FtwMoPorezm3NI1EYI3csE%3Dreserved=0

--
Best Regards,
Mike Gavrilov.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

2021-01-10 Thread Mikhail Gavrilov
Hi folks,
today I joined to testing Kernel 5.11 and saw that the kernel log was
flooded with BUG messages:
BUG: sleeping function called from invalid context at mm/vmalloc.c:1756
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 266, name: kswapd0
INFO: lockdep is turned off.
CPU: 15 PID: 266 Comm: kswapd0 Tainted: GW-
---  5.11.0-0.rc2.20210108gitf5e6c330254a.119.fc34.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
Call Trace:
 dump_stack+0x8b/0xb0
 ___might_sleep.cold+0xb6/0xc6
 vm_unmap_aliases+0x21/0x40
 change_page_attr_set_clr+0x9e/0x190
 set_memory_wb+0x2f/0x80
 ttm_pool_free_page+0x28/0x90 [ttm]
 ttm_pool_shrink+0x45/0xb0 [ttm]
 ttm_pool_shrinker_scan+0xa/0x20 [ttm]
 do_shrink_slab+0x177/0x3a0
 shrink_slab+0x9c/0x290
 shrink_node+0x2e6/0x700
 balance_pgdat+0x2f5/0x650
 kswapd+0x21d/0x4d0
 ? do_wait_intr_irq+0xd0/0xd0
 ? balance_pgdat+0x650/0x650
 kthread+0x13a/0x150
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30

But the most unpleasant thing is that after a while the monitor turns
off and does not go on again until the restart.
This is accompanied by an entry in the kernel log:

amdgpu :0b:00.0: amdgpu: ff7d8b94 pin failed
[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin
framebuffer with error -12

$ grep "Failed to pin framebuffer with error" -Rn .
./drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:5816:
DRM_ERROR("Failed to pin framebuffer with error %d\n", r);

$ git blame -L 5811,5821 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
Blaming lines:   0% (11/9167), done.
5d43be0ccbc2f (Christian König 2017-10-26 18:06:23 +0200 5811)
 domain = AMDGPU_GEM_DOMAIN_VRAM;
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5812)
7b7c6c81b3a37 (Junwei Zhang2018-06-25 12:51:14 +0800 5813)  r =
amdgpu_bo_pin(rbo, domain);
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5814)  if
(unlikely(r != 0)) {
30b7c6147d18d (Harry Wentland  2017-10-26 15:35:14 -0400 5815)
 if (r != -ERESTARTSYS)
30b7c6147d18d (Harry Wentland  2017-10-26 15:35:14 -0400 5816)
 DRM_ERROR("Failed to pin framebuffer with error %d\n", r);
0f257b09531b4 (Chunming Zhou   2019-05-07 19:45:31 +0800 5817)
 ttm_eu_backoff_reservation(, );
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5818)
 return r;
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5819)  }
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5820)
bb812f1ea87dd (Junwei Zhang2018-06-25 13:32:24 +0800 5821)  r =
amdgpu_ttm_alloc_gart(>tbo);

Who knows how to fix it?

Full kernel logs is here:
[1] https://pastebin.com/fLasjDHX
[2] https://pastebin.com/g3wR2r9e

--
Best Regards,
Mike Gavrilov.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel