Re: [bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu]

2021-01-31 Thread Hillf Danton
On Sun, 31 Jan 2021 04:17:46 +0500 Mikhail Gavrilov wrote:
> 
> The 5.11-rc5 (git 76c057c84d28) brought a new issue.
> Now the kernel log is flooded with the message "page allocation failure".

Thanks for your report.
> 
> Trace:
> msedge:cs0: page allocation failure: order:10,

This order is prone to failure even without NORETRY.

> mode:0x190cc2(GFP_HIGHUSER|__GFP_NORETRY|__GFP_NOMEMALLOC),
> nodemask=(null),cpuset=/,mems_allowed=0
> CPU: 18 PID: 4540 Comm: msedge:cs0 Tainted: GW
> - ---  5.11.0-0.rc5.20210128git76c057c84d28.138.fc34.x86_64 #1
> Hardware name: System manufacturer System Product Name/ROG STRIX
> X570-I GAMING, BIOS 3402 01/13/2021
> Call Trace:
>  dump_stack+0x8b/0xb0
>  warn_alloc.cold+0x72/0xd6
>  ? _cond_resched+0x16/0x50
>  ? __alloc_pages_direct_compact+0x1a1/0x210
>  __alloc_pages_slowpath.constprop.0+0xf64/0xf90
>  ? kmem_cache_alloc+0x299/0x310
>  ? lock_acquire+0x173/0x380
>  ? trace_hardirqs_on+0x1b/0xe0
>  ? lock_release+0x1e9/0x400
>  __alloc_pages_nodemask+0x37d/0x400
>  ttm_pool_alloc+0x2a3/0x630 [ttm]
>  ttm_tt_populate+0x37/0xe0 [ttm]
>  ttm_bo_handle_move_mem+0x142/0x180 [ttm]
>  ttm_bo_evict+0x12e/0x1b0 [ttm]
>  ? kfree+0xeb/0x660
>  ? amdgpu_vram_mgr_new+0x34d/0x3d0 [amdgpu]
>  ttm_mem_evict_first+0x101/0x4d0 [ttm]
>  ttm_bo_mem_space+0x2c8/0x330 [ttm]
>  ttm_bo_validate+0x163/0x1c0 [ttm]
>  amdgpu_cs_bo_validate+0x82/0x190 [amdgpu]
>  amdgpu_cs_list_validate+0x105/0x150 [amdgpu]
>  amdgpu_cs_ioctl+0x803/0x1ef0 [amdgpu]
>  ? trace_hardirqs_off_caller+0x41/0xd0
>  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
>  drm_ioctl_kernel+0x8c/0xe0 [drm]
>  drm_ioctl+0x20f/0x3c0 [drm]
>  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
>  ? selinux_file_ioctl+0x147/0x200
>  ? lock_acquired+0x1fa/0x380
>  ? lock_release+0x1e9/0x400
>  ? trace_hardirqs_on+0x1b/0xe0
>  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
>  __x64_sys_ioctl+0x82/0xb0
>  do_syscall_64+0x33/0x40
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f829c36c11b
> Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c
> c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d
> 01 f0 ff ff 73 01 c3 48 8b 0d 25 bd 0c 00 f7 d8 64 89 01 48
> RSP: 002b:7f8282c14f38 EFLAGS: 0246 ORIG_RAX: 0010
> RAX: ffda RBX: 7f8282c14fa0 RCX: 7f829c36c11b
> RDX: 7f8282c14fa0 RSI: c0186444 RDI: 0018
> RBP: c0186444 R08: 7f8282c15640 R09: 7f8282c14f80
> R10:  R11: 0246 R12: 1f592c0fe088
> R13: 0018 R14:  R15: fffd
> Mem-Info:
> active_anon:24325 inactive_anon:3569299 isolated_anon:0
>  active_file:704540 inactive_file:2709725 isolated_file:0
>  unevictable:1230 dirty:256317 writeback:7074
>  slab_reclaimable:222328 slab_unreclaimable:112852
>  mapped:838359 shmem:469422 pagetables:47722 bounce:0
>  free:107165 free_pcp:1298 free_cma:0
> Node 0 active_anon:97300kB inactive_anon:14277196kB
> active_file:2818160kB inactive_file:10838900kB unevictable:4920kB
> isolated(anon):0kB isolated(file):0kB mapped:3353436kB dirty:1025268kB
> writeback:28296kB shmem:1877688kB shmem_thp: 0kB shmem_pmdmapped: 0kB
> anon_thp: 0kB writeback_tmp:0kB kernel_stack:62528kB
> pagetables:190888kB all_unreclaimable? no
> Node 0 DMA free:11800kB min:32kB low:44kB high:56kB
> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> present:15992kB managed:15900kB mlocked:0kB bounce:0kB free_pcp:0kB
> local_pcp:0kB free_cma:0kB
> lowmem_reserve[]: 0 3056 31787 31787 31787
> Node 0 DMA32 free:303044kB min:6492kB low:9620kB high:12748kB
> reserved_highatomic:0KB active_anon:20kB inactive_anon:1322808kB
> active_file:5136kB inactive_file:483136kB unevictable:0kB
> writepending:220876kB present:3314552kB managed:3246620kB mlocked:0kB
> bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB
> lowmem_reserve[]: 0 0 28731 28731 28731
> Node 0 Normal free:113816kB min:61052kB low:90472kB high:119892kB
> reserved_highatomic:0KB active_anon:97280kB inactive_anon:12953852kB
> active_file:2812656kB inactive_file:10355000kB unevictable:4920kB
> writepending:832688kB present:30133248kB managed:29421044kB
> mlocked:4920kB bounce:0kB free_pcp:5180kB local_pcp:4kB free_cma:0kB
> lowmem_reserve[]: 0 0 0 0 0
> Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U)
> 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11800kB
> Node 0 DMA32: 1009*4kB (UME) 724*8kB (UME) 488*16kB (UME) *32kB
> (UME) 950*64kB (UME) 620*128kB (UME) 223*256kB (UME) 74*512kB (M)
> 11*1024kB (M) 2*2048kB (ME) 0*4096kB = 303684kB
> Node 0 Normal: 964*4kB (UME) 719*8kB (ME) 379*16kB (UME) 192*32kB
> (UME) 127*64kB (UME) 130*128kB (UME) 122*256kB (UME) 18*512kB (UME)
> 4*1024kB (UM) 11*2048kB (UM) 0*4096kB = 113656kB
> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
> hugepages_size=1048576kB
> Node 0 hugepages_total=0 hug

Re: BUG: kernel NULL pointer dereference, address: 0000000000000026 after switching to 5.7 kernel

2020-04-11 Thread Hillf Danton


On Sat, 11 Apr 2020 00:51:48 +0500 Mikhail Gavrilov wrote:
> Hi folks.
> After upgrade kernel to 5.7 I see every boot in kernel log following
> error messages:
> 
> [2.569513] [drm] Found UVD firmware ENC: 1.2 DEC: .43 Family ID: 19
> [2.569538] [drm] PSP loading UVD firmware
> [2.570038] BUG: kernel NULL pointer dereference, address: 0026
> [2.570045] #PF: supervisor read access in kernel mode
> [2.570050] #PF: error_code(0x) - not-present page
> [2.570055] PGD 0 P4D 0
> [2.570060] Oops:  [#1] SMP NOPTI
> [2.570065] CPU: 5 PID: 667 Comm: uvd_enc_1.1 Not tainted
> 5.7.0-0.rc0.git6.1.2.fc33.x86_64 #1
> [2.570072] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 1405 11/19/2019
> [2.570085] RIP: 0010:__kthread_should_park+0x5/0x30
> [2.570090] Code: 00 e9 fe fe ff ff e8 ca 3a 08 00 e9 49 fe ff ff
> 48 89 df e8 dd 38 08 00 84 c0 0f 84 6a ff ff ff e9 a6 fe ff ff 0f 1f
> 44 00 00  47 26 20 74 12 48 8b 87 88 09 00 00 48 8b 00 48 c1 e8 02
> 83 e0
> [2.570103] RSP: 0018:ad8141723e50 EFLAGS: 00010246
> [2.570107] RAX: 7fff RBX: 8a8d1d116ed8 RCX: 
> 
> [2.570112] RDX:  RSI:  RDI: 
> 
> [2.570116] RBP: 8a8d28c11300 R08:  R09: 
> 
> [2.570120] R10:  R11:  R12: 
> 8a8d1d152e40
> [2.570125] R13: 8a8d1d117280 R14: 8a8d1d116ed8 R15: 
> 8a8d1ca68000
> [2.570131] FS:  () GS:8a8d3aa0()
> knlGS:
> [2.570137] CS:  0010 DS:  ES:  CR0: 80050033
> [2.570142] CR2: 0026 CR3: 0007e3dc6000 CR4: 
> 003406e0
> [2.570147] Call Trace:
> [2.570157]  drm_sched_get_cleanup_job+0x42/0x130 [gpu_sched]
> [2.570166]  drm_sched_main+0x6f/0x530 [gpu_sched]
> [2.570173]  ? lockdep_hardirqs_on+0x11e/0x1b0
> [2.570179]  ? drm_sched_get_cleanup_job+0x130/0x130 [gpu_sched]
> [2.570185]  kthread+0x131/0x150
> [2.570189]  ? __kthread_bind_mask+0x60/0x60
> [2.570196]  ret_from_fork+0x27/0x50
> [2.570203] Modules linked in: fjes(-) amdgpu(+) amd_iommu_v2
> gpu_sched ttm drm_kms_helper drm crc32c_intel igb nvme nvme_core dca
> i2c_algo_bit wmi pinctrl_amd br_netfilter bridge stp llc fuse
> [2.570223] CR2: 0026
> [2.570228] ---[ end trace 80c25d326e1e0d7c ]---
> [2.570233] RIP: 0010:__kthread_should_park+0x5/0x30
> [2.570238] Code: 00 e9 fe fe ff ff e8 ca 3a 08 00 e9 49 fe ff ff
> 48 89 df e8 dd 38 08 00 84 c0 0f 84 6a ff ff ff e9 a6 fe ff ff 0f 1f
> 44 00 00  47 26 20 74 12 48 8b 87 88 09 00 00 48 8b 00 48 c1 e8 02
> 83 e0
> [2.570250] RSP: 0018:ad8141723e50 EFLAGS: 00010246
> [2.570255] RAX: 7fff RBX: 8a8d1d116ed8 RCX: 
> 
> [2.570260] RDX:  RSI:  RDI: 
> 
> [2.570265] RBP: 8a8d28c11300 R08:  R09: 
> 
> [2.570271] R10:  R11:  R12: 
> 8a8d1d152e40
> [2.570276] R13: 8a8d1d117280 R14: 8a8d1d116ed8 R15: 
> 8a8d1ca68000
> [2.570281] FS:  () GS:8a8d3aa0()
> knlGS:
> [2.570287] CS:  0010 DS:  ES:  CR0: 80050033
> [2.570292] CR2: 0026 CR3: 0007e3dc6000 CR4: 
> 003406e0
> [2.570299] BUG: sleeping function called from invalid context at
> include/linux/percpu-rwsem.h:49
> [2.570306] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid:
> 667, name: uvd_enc_1.1
> [2.570311] INFO: lockdep is turned off.
> [2.570315] irq event stamp: 14
> [2.570319] hardirqs last  enabled at (13): []
> _raw_spin_unlock_irqrestore+0x46/0x60
> [2.570330] hardirqs last disabled at (14): []
> trace_hardirqs_off_thunk+0x1a/0x1c
> [2.570338] softirqs last  enabled at (0): []
> copy_process+0x706/0x1bc0
> [2.570345] softirqs last disabled at (0): [<>] 0x0
> [2.570351] CPU: 5 PID: 667 Comm: uvd_enc_1.1 Tainted: G  D
>   5.7.0-0.rc0.git6.1.2.fc33.x86_64 #1
> [2.570358] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 1405 11/19/2019
> [2.570365] Call Trace:
> [2.570373]  dump_stack+0x8b/0xc8
> [2.570380]  ___might_sleep.cold+0xb6/0xc6
> [2.570385]  exit_signals+0x1c/0x2d0
> [2.570390]  do_exit+0xb1/0xc30
> [2.570395]  ? kthread+0x131/0x150
> [2.570400]  rewind_stack_do_exit+0x17/0x20
> [2.570559] [drm] Found VCE firmware Version: 57.6 Binary ID: 4
> [2.570572] [drm] PSP loading VCE firmware
> [3.146462] [drm] reserve 0x40 from 0x83fe80 for PSP TMR
> 
> $ /usr/src/kernels/`uname -r`/scripts/faddr2line
> /lib/debug/lib/modules/`uname -r`/vmlinux __kthread_should_park+0x5
> __kthread_should_park+0x5/0x30:
> to_

Re: KASAN: use-after-free Read in vgem_gem_dumb_create

2020-02-01 Thread Hillf Danton


Fri, 31 Jan 2020 14:28:10 -0800 (PST)
> syzbot found the following crash on:
> 
> HEAD commit:39bed42d Merge tag 'for-linus-hmm' of git://git.kernel.org..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=179465bee0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=2646535f8818ae25
> dashboard link: https://syzkaller.appspot.com/bug?extid=0dc774d419e916c8
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=16251279e0
> 
> The bug was bisected to:
> 
> commit 7611750784664db46d0db95631e322aeb263dde7
> Author: Alex Deucher 
> Date:   Wed Jun 21 16:31:41 2017 +
> 
> drm/amdgpu: use kernel is_power_of_2 rather than local version
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=11628df1e0
> final crash:https://syzkaller.appspot.com/x/report.txt?x=13628df1e0
> console output: https://syzkaller.appspot.com/x/log.txt?x=15628df1e0
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+0dc774d419e91...@syzkaller.appspotmail.com
> Fixes: 761175078466 ("drm/amdgpu: use kernel is_power_of_2 rather than local 
> version")
> 
> ==
> BUG: KASAN: use-after-free in vgem_gem_dumb_create+0x238/0x250 
> drivers/gpu/drm/vgem/vgem_drv.c:221
> Read of size 8 at addr 88809fa67908 by task syz-executor.0/14871
> 
> CPU: 0 PID: 14871 Comm: syz-executor.0 Not tainted 5.5.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x197/0x210 lib/dump_stack.c:118
>  print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
>  __kasan_report.cold+0x1b/0x32 mm/kasan/report.c:506
>  kasan_report+0x12/0x20 mm/kasan/common.c:639
>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135
>  vgem_gem_dumb_create+0x238/0x250 drivers/gpu/drm/vgem/vgem_drv.c:221
>  drm_mode_create_dumb+0x282/0x310 drivers/gpu/drm/drm_dumb_buffers.c:94
>  drm_mode_create_dumb_ioctl+0x26/0x30 drivers/gpu/drm/drm_dumb_buffers.c:100
>  drm_ioctl_kernel+0x244/0x300 drivers/gpu/drm/drm_ioctl.c:786
>  drm_ioctl+0x54e/0xa60 drivers/gpu/drm/drm_ioctl.c:886
>  vfs_ioctl fs/ioctl.c:47 [inline]
>  ksys_ioctl+0x123/0x180 fs/ioctl.c:747
>  __do_sys_ioctl fs/ioctl.c:756 [inline]
>  __se_sys_ioctl fs/ioctl.c:754 [inline]
>  __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:754
>  do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x45b349
> Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 
> 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:7f871af46c78 EFLAGS: 0246 ORIG_RAX: 0010
> RAX: ffda RBX: 7f871af476d4 RCX: 0045b349
> RDX: 2180 RSI: c02064b2 RDI: 0003
> RBP: 0075bf20 R08:  R09: 
> R10:  R11: 0246 R12: 
> R13: 0285 R14: 004d14d0 R15: 0075bf2c
> 
> Allocated by task 14871:
>  save_stack+0x23/0x90 mm/kasan/common.c:72
>  set_track mm/kasan/common.c:80 [inline]
>  __kasan_kmalloc mm/kasan/common.c:513 [inline]
>  __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486
>  kasan_kmalloc+0x9/0x10 mm/kasan/common.c:527
>  kmem_cache_alloc_trace+0x158/0x790 mm/slab.c:3551
>  kmalloc include/linux/slab.h:556 [inline]
>  kzalloc include/linux/slab.h:670 [inline]
>  __vgem_gem_create+0x49/0x100 drivers/gpu/drm/vgem/vgem_drv.c:165
>  vgem_gem_create drivers/gpu/drm/vgem/vgem_drv.c:194 [inline]
>  vgem_gem_dumb_create+0xd7/0x250 drivers/gpu/drm/vgem/vgem_drv.c:217
>  drm_mode_create_dumb+0x282/0x310 drivers/gpu/drm/drm_dumb_buffers.c:94
>  drm_mode_create_dumb_ioctl+0x26/0x30 drivers/gpu/drm/drm_dumb_buffers.c:100
>  drm_ioctl_kernel+0x244/0x300 drivers/gpu/drm/drm_ioctl.c:786
>  drm_ioctl+0x54e/0xa60 drivers/gpu/drm/drm_ioctl.c:886
>  vfs_ioctl fs/ioctl.c:47 [inline]
>  ksys_ioctl+0x123/0x180 fs/ioctl.c:747
>  __do_sys_ioctl fs/ioctl.c:756 [inline]
>  __se_sys_ioctl fs/ioctl.c:754 [inline]
>  __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:754
>  do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Freed by task 14871:
>  save_stack+0x23/0x90 mm/kasan/common.c:72
>  set_track mm/kasan/common.c:80 [inline]
>  kasan_set_free_info mm/kasan/common.c:335 [inline]
>  __kasan_slab_free+0x102/0x150 mm/kasan/common.c:474
>  kasan_slab_free+0xe/0x10 mm/kasan/common.c:483
>  __cache_free mm/slab.c:3426 [inline]
>  kfree+0x10a/0x2c0 mm/slab.c:3757
>  vgem_gem_free_object+0xbe/0xe0 drivers/gpu/drm/vgem/vgem_drv.c:68
>  drm_gem_object_free+0x100/0x220 drivers/gpu/drm/drm_gem.

Re: KASAN: use-after-free Read in vgem_gem_dumb_create

2020-02-01 Thread Hillf Danton


On Sat, 1 Feb 2020 09:17:57 +0300 Dan Carpenter wrote:
> On Sat, Feb 01, 2020 at 12:32:09PM +0800, Hillf Danton wrote:
> >
> > Release obj in error path.
> > 
> > --- a/drivers/gpu/drm/vgem/vgem_drv.c
> > +++ b/drivers/gpu/drm/vgem/vgem_drv.c
> > @@ -196,10 +196,10 @@ static struct drm_gem_object *vgem_gem_c
> > return ERR_CAST(obj);
> >  
> > ret = drm_gem_handle_create(file, &obj->base, handle);
> > -   drm_gem_object_put_unlocked(&obj->base);
> > -   if (ret)
> > +   if (ret) {
> > +   drm_gem_object_put_unlocked(&obj->base);
> > return ERR_PTR(ret);
> > -
> > +   }
> > return &obj->base;
> 
> Oh yeah.  It's weird that we never noticed the success path was broken.
> It's been that way for three years and no one noticed at all.  Very
> strange.
> 
> Anyway, it already gets freed on error in drm_gem_handle_create() so
> we should just delete the drm_gem_object_put_unlocked() here it looks
> like.

Good catch, Dan :P
Would you please post a patch sometime convenient next week?

Thanks
Hillf

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]

2019-09-04 Thread Hillf Danton
Daniel Vetter 
>>
>> Now 11:01pm and "gnome shell stuck warning" not appear since 19:17. So
>> looks like issue happens only when computer blocked and monitor in
>> power save mode.
>
> I'd bet on runtime pm or some other power saving feature in amdgpu
> shutting the interrupt handling down before we've handled all the
> interrupts. That would then result in a stuck fence.
>
> Do we already know which fence is stuck?

It is welcomed to shed a thread of light on how to collect/print that info.
Say line:xxx-yyy in path/to/amdgpu/zzz.c

Thanks
Hillf

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]

2019-09-04 Thread Hillf Danton
On Tue, 3 Sep 2019 11:48:12 +0500 From:   Mikhail Gavrilov 

> On Fri, 30 Aug 2019 at 08:30, Hillf Danton  wrote:
> >
> > Add a warning to show if it makes sense in field: neither regression nor
> > problem will have been observed with the warning printed.
>
> I caught the problem.
> 
>
> [21793.094289] [ cut here ]
> [21793.094296] gnome shell stuck warning
> [21793.094391] WARNING: CPU: 14 PID: 1768 at
> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:332
>
Thanks Mike.

Describe the problems you are experiencing please.
Say is the screen locked up? Machine lockedup? 
Anything unnormal after you see the warning?

Hillf
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]

2019-08-30 Thread Hillf Danton

On Fri, 30 Aug 2019 06:04:06 +0800 Mikhail Gavrilov wrote:
> On Sun, Aug 25, 2019 at 10:13:05PM +0800, Hillf Danton wrote:
> > Can we try to add the fallback timer manually?
> >
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -322,6 +322,10 @@ int amdgpu_fence_wait_empty(struct amdgp
> > }
> > rcu_read_unlock();
> > 
> > +   if (!timer_pending(&ring->fence_drv.fallback_timer))
> > +   mod_timer(&ring->fence_drv.fallback_timer,
> > +   jiffies + (AMDGPU_FENCE_JIFFIES_TIMEOUT << 1));
> > +
> > r = dma_fence_wait(fence, false);
> > dma_fence_put(fence);
> > return r;
> > --
> >
> > Or simply wait with an ear on signal and timeout if adding timer
> > seems to go a bit too far?
> >
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -322,7 +322,12 @@ int amdgpu_fence_wait_empty(struct amdgp
> > }
> > rcu_read_unlock();
> > 
> > -   r = dma_fence_wait(fence, false);
> > +   if (0 < dma_fence_wait_timeout(fence, true,
> > +   AMDGPU_FENCE_JIFFIES_TIMEOUT +
> > +   (AMDGPU_FENCE_JIFFIES_TIMEOUT >> 3)))
> > +   r = 0;
> > +   else
> > +   r = -EINVAL;
> > dma_fence_put(fence);

WARN(r, "gnome shell stuck warning\n");

> > return r;
> >  }
> 
> I tested both patches on top of 5.3 RC6. Each patch I was tested more
> than 24 hours and I don't seen any regressions or problems with them.
> 
Add a warning to show if it makes sense in field: neither regression nor
problem will have been observed with the warning printed.

Thanks
Hillf

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]

2019-08-26 Thread Hillf Danton

On Sun, 25 Aug 2019 04:28:01 -0700 Mikhail Gavrilov wrote:
> Hi folks,
> I left unblocked gnome-shell at noon, and when I returned at the
> evening I discovered than monitor not sleeping and show open gnome
> activity. At first, I thought that some application did not let fall
> asleep the system. But when I try to move the mouse, I realized that
> the system hanged. So I connect via ssh and tried to investigate the
> problem. I did not see anything strange in kernel logs. And my last
> idea before trying to kill the gnome-shell process was dumps tasks
> that are in uninterruptable (blocked) state.
> 
> After [Alt + PrnScr + W] I saw this:
> 
> [32840.701909] sysrq: Show Blocked State
> [32840.701976]   taskPC stack   pid father
> [32840.702407] gnome-shell D11240  1900   1830 0x
> [32840.702438] Call Trace:
> [32840.702446]  ? __schedule+0x352/0x900
> [32840.702453]  schedule+0x3a/0xb0
> [32840.702457]  schedule_timeout+0x289/0x3c0
> [32840.702461]  ? find_held_lock+0x32/0x90
> [32840.702464]  ? find_held_lock+0x32/0x90
> [32840.702469]  ? mark_held_locks+0x50/0x80
> [32840.702473]  ? _raw_spin_unlock_irqrestore+0x4b/0x60
> [32840.702478]  dma_fence_default_wait+0x1f5/0x340
> [32840.702482]  ? dma_fence_free+0x20/0x20
> [32840.702487]  dma_fence_wait_timeout+0x182/0x1e0
> [32840.702533]  amdgpu_fence_wait_empty+0xe7/0x210 [amdgpu]
> [32840.702577]  amdgpu_pm_compute_clocks+0x70/0x5f0 [amdgpu]
> [32840.702641]  dm_pp_apply_display_requirements+0x19e/0x1c0 [amdgpu]
> [32840.702705]  dce12_update_clocks+0xd8/0x110 [amdgpu]
> [32840.702766]  dc_commit_state+0x414/0x590 [amdgpu]
> [32840.702834]  amdgpu_dm_atomic_commit_tail+0xd1e/0x1cf0 [amdgpu]
> [32840.702840]  ? reacquire_held_locks+0xed/0x210
> [32840.702848]  ? ttm_eu_backoff_reservation+0xa5/0x160 [ttm]
> [32840.702853]  ? find_held_lock+0x32/0x90
> [32840.702855]  ? find_held_lock+0x32/0x90
> [32840.702860]  ? __lock_acquire+0x247/0x1910
> [32840.702867]  ? find_held_lock+0x32/0x90
> [32840.702871]  ? mark_held_locks+0x50/0x80
> [32840.702874]  ? _raw_spin_unlock_irq+0x29/0x40
> [32840.702877]  ? lockdep_hardirqs_on+0xf0/0x180
> [32840.702881]  ? _raw_spin_unlock_irq+0x29/0x40
> [32840.702884]  ? wait_for_completion_timeout+0x75/0x190
> [32840.702895]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
> [32840.702902]  commit_tail+0x3c/0x70 [drm_kms_helper]
> [32840.702909]  drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
> [32840.702922]  drm_atomic_connector_commit_dpms+0xd7/0x100 [drm]
> [32840.702936]  set_property_atomic+0xcc/0x140 [drm]
> [32840.702955]  drm_mode_obj_set_property_ioctl+0xcb/0x1c0 [drm]
> [32840.702968]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
> [32840.702978]  drm_ioctl_kernel+0xaa/0xf0 [drm]
> [32840.702990]  drm_ioctl+0x208/0x390 [drm]
> [32840.703003]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
> [32840.703007]  ? sched_clock_cpu+0xc/0xc0
> [32840.703012]  ? lockdep_hardirqs_on+0xf0/0x180
> [32840.703053]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
> [32840.703058]  do_vfs_ioctl+0x411/0x750
> [32840.703065]  ksys_ioctl+0x5e/0x90
> [32840.703069]  __x64_sys_ioctl+0x16/0x20
> [32840.703072]  do_syscall_64+0x5c/0xb0
> [32840.703076]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [32840.703079] RIP: 0033:0x7f8bcab0f00b
> [32840.703084] Code: Bad RIP value.
> [32840.703086] RSP: 002b:7ffe76c62338 EFLAGS: 0246 ORIG_RAX: 
> 0010
> [32840.703089] RAX: ffda RBX: 7ffe76c62370 RCX: 
> 7f8bcab0f00b
> [32840.703092] RDX: 7ffe76c62370 RSI: c01864ba RDI: 
> 0009
> [32840.703094] RBP: c01864ba R08: 0003 R09: 
> c0c0c0c0
> [32840.703096] R10: 56476c86a018 R11: 0246 R12: 
> 56476c8ad940
> [32840.703098] R13: 0009 R14: 0002 R15: 
> 0003
> [root@localhost ~]#
> [root@localhost ~]# ps aux | grep gnome-shell
> mikhail 1900  0.3  1.1 6447496 378696 tty2   Dl+  Aug24   2:10 > 
> /usr/bin/gnome-shell
> mikhail 2099  0.0  0.0 519984 23392 ?Ssl  Aug24   0:00 > 
> /usr/libexec/gnome-shell-calendar-server
> mikhail12214  0.0  0.0 399484 29660 pts/2Sl+  Aug24   0:00 > 
> /usr/bin/python3 /usr/bin/chrome-gnome-shell
> chrome-extension://gphhapmejobijbbhgpjhcjognlahblep/
> root   22957  0.0  0.0 216120  2456 pts/10   S+   03:59   0:00 > grep 
> --color=auto gnome-shell
> 
> After it, I tried to kill gnome-shell process with signal 9, but the
> process won't terminate after several unsuccessful attempts.
> 
> Only [Alt + PrnScr + B] helped reboot the hanging system.
> I am writing here because I hope some ampgpu hackers cal look in the
> trace and understand that is happening.
> 
> Sorry, I dont know how to reproduce this bug. But the problem itself
> is very annoying.
> 
> Thanks.
> 
> GPU: AMD Radeon VII
> Kernel: 5.3 RC5
> 
Can we try to add the fallback timer manually?

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgp

Re: The issue with page allocation 5.3 rc1-rc2 (seems drm culprit here)

2019-08-08 Thread Hillf Danton

On Thu, 8 Aug 2019 13:32:06 +0800 Alex Deucher wrote:
> 
> On Wed, Aug 7, 2019 at 11:49 PM Mikhail Gavrilov wrote:
> >
> > Unfortunately error "gnome-shell: page allocation failure: order:4,
> > mode:0x40cc0(GFP_KERNEL|__GFP_COMP),
> > nodemask=(null),cpuset=/,mems_allowed=0" still happens even with
> > applying this patch.

Thanks Mikhail.

No surpring to see the warning because of kvmalloc on top of the current
kmalloc. Any other difference observed?

> I think we can just drop the kmalloc altogether.

Dropping kmalloc altogether OTOH makes the reason for the vmalloc
fallback IMO, Sir?

> How about this patch?
> 
> From: Alex Deucher 
> Date: Thu, 8 Aug 2019 00:29:23 -0500
> Subject: [PATCH] drm/amd/display: use kvmalloc for dc_state
> 
> It's large and doesn't need contiguous memory.
> 
> Signed-off-by: Alex Deucher 
> ---

Looks good to me if with a kvfree added.

>  drivers/gpu/drm/amd/display/dc/core/dc.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
> b/drivers/gpu/drm/amd/display/dc/core/dc.c
> index 252b621d93a9..ef780a4e484a 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
> @@ -23,6 +23,7 @@
>   */
>  
>  #include 
> +#include 
>  
>  #include "dm_services.h"
>  
> @@ -1183,8 +1184,8 @@ bool dc_post_update_surfaces_to_stream(struct dc *dc)
>  
>  struct dc_state *dc_create_state(struct dc *dc)
>  {
> - struct dc_state *context = kzalloc(sizeof(struct dc_state),
> -GFP_KERNEL);
> + struct dc_state *context = kvzalloc(sizeof(struct dc_state),
> + GFP_KERNEL);
>  
>   if (!context)
>   return NULL;
> @@ -1204,11 +1205,11 @@ struct dc_state *dc_create_state(struct dc *dc)
>  struct dc_state *dc_copy_state(struct dc_state *src_ctx)
>  {
>   int i, j;
> - struct dc_state *new_ctx = kmemdup(src_ctx,
> - sizeof(struct dc_state), GFP_KERNEL);
> + struct dc_state *new_ctx = kvmalloc(sizeof(struct dc_state), 
> GFP_KERNEL);
>  
>   if (!new_ctx)
>   return NULL;
> + memcpy(new_ctx, src_ctx, sizeof(struct dc_state));
>  
>   for (i = 0; i < MAX_PIPES; i++) {
>   struct pipe_ctx *cur_pipe = 
> &new_ctx->res_ctx.pipe_ctx[i];
> -- 
> 2.20.1
> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: The issue with page allocation 5.3 rc1-rc2 (seems drm culprit here)

2019-08-06 Thread Hillf Danton

On Tue, 6 Aug 2019 01:15:01 +0800 Mikhail Gavrilov wrote:
>
> Unfortunately couldn't check this patch because, with the patch, the
> kernel did not compile.
> Here is compile error messages:
>
> drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c: In function
> 'dc_create_state':
> drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:1178:13: error:
> implicit declaration of function 'kvzalloc'; did you mean 'kzalloc'?
> [-Werror=implicit-function-declaration]
>  1178 |   context = kvzalloc(sizeof(struct dc_state),
>   | ^~~~
>   | kzalloc
> drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:1178:11: warning:
> assignment to 'struct dc_state *' from 'int' makes pointer from
> integer without a cast [-Wint-conversion]
>  1178 |   context = kvzalloc(sizeof(struct dc_state),
>   |   ^
> drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c: In function 
> 'dc_copy_state':
> drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:1203:13: error:
> implicit declaration of function 'kvmalloc'; did you mean 'kmalloc'?
> [-Werror=implicit-function-declaration]
>  1203 |   new_ctx = kvmalloc(sizeof(*new_ctx), GFP_KERNEL);
>   | ^~~~
>   | kmalloc
> drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:1203:11: warning:
> assignment to 'struct dc_state *' from 'int' makes pointer from
> integer without a cast [-Wint-conversion]
>  1203 |   new_ctx = kvmalloc(sizeof(*new_ctx), GFP_KERNEL);
>   |   ^
> drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c: In function 
> 'dc_state_free':
> drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:1242:2: error:
> implicit declaration of function 'kvfree'; did you mean 'kzfree'?
> [-Werror=implicit-function-declaration]
>  1242 |  kvfree(context);
>   |  ^~
>   |  kzfree
> cc1: some warnings being treated as errors
> make[4]: *** [scripts/Makefile.build:274:
> drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.o] Error 1
> make[4]: *** Waiting for unfinished jobs
> make[3]: *** [scripts/Makefile.build:490: drivers/gpu/drm/amd/amdgpu] Error 2
> make[3]: *** Waiting for unfinished jobs
> make: *** [Makefile:1084: drivers] Error 2

My bad, respin with one header file added.

Hillf
-8<---

--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -23,6 +23,7 @@
  */

 #include 
+#include 

 #include "dm_services.h"

@@ -1174,8 +1175,12 @@ struct dc_state *dc_create_state(struct
struct dc_state *context = kzalloc(sizeof(struct dc_state),
   GFP_KERNEL);

-   if (!context)
-   return NULL;
+   if (!context) {
+   context = kvzalloc(sizeof(struct dc_state),
+  GFP_KERNEL);
+   if (!context)
+   return NULL;
+   }
/* Each context must have their own instance of VBA and in order to
 * initialize and obtain IP and SOC the base DML instance from DC is
 * initially copied into every context
@@ -1195,8 +1200,13 @@ struct dc_state *dc_copy_state(struct dc
struct dc_state *new_ctx = kmemdup(src_ctx,
sizeof(struct dc_state), GFP_KERNEL);

-   if (!new_ctx)
-   return NULL;
+   if (!new_ctx) {
+   new_ctx = kvmalloc(sizeof(*new_ctx), GFP_KERNEL);
+   if (new_ctx)
+   *new_ctx = *src_ctx;
+   else
+   return NULL;
+   }

for (i = 0; i < MAX_PIPES; i++) {
struct pipe_ctx *cur_pipe = 
&new_ctx->res_ctx.pipe_ctx[i];
@@ -1230,7 +1240,7 @@ static void dc_state_free(struct kref *k
 {
struct dc_state *context = container_of(kref, struct dc_state, 
refcount);
dc_resource_state_destruct(context);
-   kfree(context);
+   kvfree(context);
 }

 void dc_release_state(struct dc_state *context)
--

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: The issue with page allocation 5.3 rc1-rc2 (seems drm culprit here)

2019-08-04 Thread Hillf Danton

On Mon, 5 Aug 2019 at 08:23, Mikhail Gavrilov wrote:
> Hi folks,
> Two weeks ago when commit 22051d9c4a57 coming to my system.
> Started happen randomly errors:
> "gnome-shell: page allocation failure: order:4,
> mode:0x40cc0(GFP_KERNEL|__GFP_COMP),
> nodemask=(null),cpuset=/,mems_allowed=0"
> Symptoms:
> The screen goes out as in energy saving.
> And it is impossible to wake the computer in a few minutes.
> 
> I am making bisect and looks like the first bad commit is 476e955dd679.
> Here full bisect logs: https://mega.nz/#F!kgYFxAIb!v1tcHANPy2ns1lh4LQLeIg
> 
> I wrote about my find to the amd-gfx mailing list, but no one answer me.
> Until yesterday, I thought it was a bug in the amdgpu driver.
> But yesterday, after the next occurrence of an error, the system hangs
> completely already with another error.

[pruned]

> [ 3225.313209] Xorg: page allocation failure: order:4, 
> mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), 
> nodemask=(null),cpuset=/,mems_allowed=0
> [ 3225.313300] CPU: 2 PID: 12717 Comm: Xorg Not tainted 
> 5.3.0-0.rc2.git4.1.fc31.x86_64 #1
> [ 3225.313303] Hardware name: System manufacturer System Product Name/ROG 
> STRIX X470-I GAMING, BIOS 2406 06/21/2019
> [ 3225.313306] Call Trace:
> [ 3225.313315]  dump_stack+0x85/0xc0
> [ 3225.313321]  warn_alloc.cold+0x7b/0xfb
> [ 3225.313329]  ? _cond_resched+0x15/0x30
> [ 3225.31]  ? __alloc_pages_direct_compact+0x181/0x1a0
> [ 3225.313341]  __alloc_pages_slowpath+0xfe1/0x1020
> [ 3225.313348]  ? __lock_acquire+0x247/0x1910
> [ 3225.313365]  __alloc_pages_nodemask+0x37f/0x400
> [ 3225.313374]  kmalloc_order+0x20/0x60
> [ 3225.313378]  kmalloc_order_trace+0x1d/0x120
> [ 3225.313498]  dc_create_state+0x1f/0x60 [amdgpu]
> [ 3225.313582]  amdgpu_dm_atomic_commit_tail+0xbd7/0x1cf0 [amdgpu]
> [ 3225.313596]  ? lockdep_hardirqs_on+0xf0/0x180
> [ 3225.313615]  ? debug_check_no_obj_freed+0x107/0x1d8
> [ 3225.313685]  ? dm_determine_update_type_for_commit+0x34c/0x420 [amdgpu]
> [ 3225.313778]  ? dm_determine_update_type_for_commit+0x34c/0x420 [amdgpu]
> [ 3225.313786]  ? kfree+0x1b6/0x3b0
> [ 3225.313860]  ? dm_determine_update_type_for_commit+0x34c/0x420 [amdgpu]
> [ 3225.313875]  ? __lock_acquire+0x247/0x1910
> [ 3225.313891]  ? find_held_lock+0x32/0x90
> [ 3225.313898]  ? mark_held_locks+0x50/0x80
> [ 3225.313907]  ? _raw_spin_unlock_irq+0x29/0x40
> [ 3225.313911]  ? lockdep_hardirqs_on+0xf0/0x180
> [ 3225.313921]  ? _raw_spin_unlock_irq+0x29/0x40
> [ 3225.313928]  ? wait_for_completion_timeout+0x75/0x190
> [ 3225.313958]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
> [ 3225.313972]  commit_tail+0x3c/0x70 [drm_kms_helper]
> [ 3225.313984]  drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
> [ 3225.313994]  drm_atomic_helper_disable_plane+0x82/0xb0 [drm_kms_helper]
> [ 3225.314043]  drm_mode_cursor_universal+0x12c/0x240 [drm]
> [ 3225.314147]  drm_mode_cursor_common+0xd8/0x230 [drm]
> [ 3225.314194]  ? drm_mode_setplane+0x1a0/0x1a0 [drm]
> [ 3225.314209]  drm_mode_cursor_ioctl+0x4d/0x70 [drm]
> [ 3225.314244]  drm_ioctl_kernel+0xaa/0xf0 [drm]
> [ 3225.314260]  drm_ioctl+0x208/0x390 [drm]
> [ 3225.314275]  ? drm_mode_setplane+0x1a0/0x1a0 [drm]
> [ 3225.314297]  ? lockdep_hardirqs_on+0xf0/0x180
> [ 3225.314376]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
> [ 3225.314384]  do_vfs_ioctl+0x411/0x750
> [ 3225.314395]  ksys_ioctl+0x5e/0x90
> [ 3225.314413]  __x64_sys_ioctl+0x16/0x20
> [ 3225.314417]  do_syscall_64+0x5c/0xb0
> [ 3225.314422]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 3225.314425] RIP: 0033:0x7fdde5b4007b
> [ 3225.314477] Code: 0f 1e fa 48 8b 05 0d 9e 0c 00 64 c7 00 26 00 00 00 48 c7 
> c0
> ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 
> f0
> ff ff 73 01 c3 48 8b 0d dd 9d 0c 00 f7 d8 64 89 01 48
> [ 3225.314485] RSP: 002b:7ffec481a6d8 EFLAGS: 0246 ORIG_RAX: 
> 0010
> [ 3225.314490] RAX: ffda RBX: 7ffec481a710 RCX: 
> 7fdde5b4007b
> [ 3225.314494] RDX: 7ffec481a710 RSI: c01c64a3 RDI: 
> 000e
> [ 3225.314496] RBP: c01c64a3 R08: 0080 R09: 
> 
> [ 3225.314499] R10: 0004 R11: 0246 R12: 
> 06f1
> [ 3225.314502] R13: 000e R14: 56201b5b5490 R15: 
> 56201bbe7820
> [ 3225.314992] Mem-Info:
> [ 3225.315020] active_anon:2784941 inactive_anon:601242 isolated_anon:0
> active_file:1926790 inactive_file:1763177 isolated_file:0
> unevictable:16 dirty:2244 writeback:0 unstable:0
> slab_reclaimable:542021 slab_unreclaimable:135707
> mapped:525720 shmem:421336 pagetables:32066 bounce:0
> free:81471 free_pcp:299 free_cma:0
> [ 3225.315026] Node 0 active_anon:11139764kB inactive_anon:2404968kB 
> active_file:7707160kB inactive_file:7052264kB unevictable:64kB 
> isolated(anon):0kB isolated(file):0kB mapped:2102880kB dirty:8976kB 
> writeback:0kB shmem:1685344kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon

Re: [PATCH 02/12] dma-buf: add dma_buf_(un)map_attachment_locked variants v3

2019-05-28 Thread Hillf Danton

On Mon, 27 May 2019 18:56:20 +0800 Christian Koenig wrote:
> Thanks for the comments, but you are looking at a completely outdated 
> patchset.
> 
> If you are interested in the newest one please ping me and I'm going to CC you
> when I send out the next version.
> 
Ping...

Thanks
Hillf

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[RFC] drm/amdkfd: Use logical cpu id for building vcrat

2019-04-16 Thread Hillf Danton
Hi folks

In commit d1c234e2cd, arm64 is granted to build kfd. Currently, it is physical
cpu id that is used for building the x86_64 vcrat, but logical cpu id is used
instead for arm64, though the function name requires apicid. Can we use the
physical id for both arches if it really has an up-hand over the logical one,
as the following tiny diff represents?

--- linux-5.1-rc4/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 2019-04-16 
07:55:56.611685400 +0800
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 2019-04-16 09:16:50.506126600 
+0800
@@ -1405,11 +1405,7 @@ static int kfd_cpumask_to_apic_id(const
first_cpu_of_numa_node = cpumask_first(cpumask);
if (first_cpu_of_numa_node >= nr_cpu_ids)
return -1;
-#ifdef CONFIG_X86_64
-   return cpu_data(first_cpu_of_numa_node).apicid;
-#else
-   return first_cpu_of_numa_node;
-#endif
+   return cpu_physical_id(first_cpu_of_numa_node);
}
 /* kfd_numa_node_to_apic_id - Returns the APIC ID of the first logical 
processor
--


Or is logical cpu id enough to do the work, with some cosmetic applied to the 
function names(not included in the following simple diff yet)?

thanks
Hillf


--- linux-5.1-rc4/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 2019-04-16 
07:55:56.611685400 +0800
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 2019-04-16 09:18:24.546578400 
+0800
@@ -1405,11 +1405,7 @@ static int kfd_cpumask_to_apic_id(const
first_cpu_of_numa_node = cpumask_first(cpumask);
if (first_cpu_of_numa_node >= nr_cpu_ids)
return -1;
-#ifdef CONFIG_X86_64
-   return cpu_data(first_cpu_of_numa_node).apicid;
-#else
return first_cpu_of_numa_node;
-#endif
}
 /* kfd_numa_node_to_apic_id - Returns the APIC ID of the first logical 
processor
--

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx