Hi Christian,
On 2024-09-27 16:11, Christian Kastner wrote:
Am I interpreting this right that the "Killed" disappeared? If so, then the
issue should be reproducible by re-enabling vm.overcommit_memory=0.
"Killed" disappeared when I ran it myself in both cases. However, it did
get further with vm.overcommit_memory=0:
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_67108864_single_op_batch_1_istride_1_CI_ostride_1_CI_idist_67108864_odist_67108864_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_67108864_single_op_batch_1_istride_1_CI_ostride_1_CI_idist_67108864_odist_67108864_ioffset_0_0_ooffset_0_0
(881 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(76872 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(11141 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(5230 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(5429 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_2_istride_1_CI_ostride_1_CP_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_2_istride_1_CI_ostride_1_CP_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(6498 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(2630 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(2718 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(8447 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(7018 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(3510 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_2_istride_1_CP_ostride_1_CP_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_2_istride_1_CP_ostride_1_CP_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(4090 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(3520 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(1766 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0
(1771 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_4_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
clients/tests/accuracy_test.h:1214: Skipped
needed_ramgb: 96, ramgb limit: 61.
[ SKIPPED ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_4_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
(0 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_4_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
clients/tests/accuracy_test.h:1214: Skipped
needed_ramgb: 96, ramgb limit: 61.
[ SKIPPED ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_4_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
(0 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_4_istride_1_CP_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
clients/tests/accuracy_test.h:1214: Skipped
needed_ramgb: 96, ramgb limit: 61.
[ SKIPPED ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_4_istride_1_CP_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
(0 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_2_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_2_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
(67340 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_2_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_2_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
(11059 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_2_istride_1_CI_ostride_1_CP_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_2_istride_1_CI_ostride_1_CP_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
(12243 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_1_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_1_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
(5412 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_1_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
[ OK ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_1_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
(5695 ms)
[ RUN ]
pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_single_ip_batch_4_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
command1 FAIL non-zero exit status 1
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
- Size: 2097152(0x200000) KB
+ Size: 31761860(0x1e4a5c4) KB
This is the pool from the gfx1035. It increased in size from 2GiB to ~32GiB.
If overcommit was indeed the issue behind "Killed", then I suspect that the
test malloc'ed so much such that it eventually triggered the OOM when both test and GPU
consumed all physical memory, eg: with a 32GiB large test case computed on both GPU and
CPU for expected/actual comparison.
The dmesg logs indicate the oom killer activating with
vm.overcommit_memory=0:
[ 633.775419] rocfft-test invoked oom-killer:
gfp_mask=0x440dc0(GFP_KERNEL_ACCOUNT|__GFP_COMP|__GFP_ZERO), order=0,
oom_score_adj=0
I've attached the rest of the dmesg log for the test. It has more details.
Sincerely,
Cory Bloor
[ 633.775419] rocfft-test invoked oom-killer:
gfp_mask=0x440dc0(GFP_KERNEL_ACCOUNT|__GFP_COMP|__GFP_ZERO), order=0,
oom_score_adj=0
[ 633.775426] CPU: 7 PID: 4053 Comm: rocfft-test Not tainted 6.10.6+bpo-amd64
#1 Debian 6.10.6-1~bpo12+1
[ 633.775429] Hardware name: Micro Computer (HK) Tech Limited UM773
Lite/F7BFD, BIOS 1.06 02/27/2023
[ 633.775430] Call Trace:
[ 633.775432] <TASK>
[ 633.775434] dump_stack_lvl+0x64/0x80
[ 633.775440] dump_header+0x44/0x1b0
[ 633.775444] oom_kill_process+0xfa/0x200
[ 633.775447] out_of_memory+0x257/0x520
[ 633.775450] __alloc_pages_slowpath.constprop.0+0xaaa/0xd60
[ 633.775456] __alloc_pages_noprof+0x309/0x340
[ 633.775460] alloc_pages_mpol_noprof+0xd9/0x1e0
[ 633.775464] pte_alloc_one+0x1d/0x60
[ 633.775468] __pte_alloc+0x2a/0xb0
[ 633.775472] do_anonymous_page+0x52b/0x7b0
[ 633.775474] ? lruvec_stat_mod_folio.constprop.0+0x1c/0x30
[ 633.775476] ? srso_alias_return_thunk+0x5/0xfbef5
[ 633.775479] ? __pmd_alloc+0x148/0x200
[ 633.775481] __handle_mm_fault+0xc3e/0x1070
[ 633.775484] ? srso_alias_return_thunk+0x5/0xfbef5
[ 633.775488] handle_mm_fault+0x190/0x320
[ 633.775491] hmm_vma_fault.isra.0+0x4d/0x90
[ 633.775495] walk_pgd_range+0x34d/0xa90
[ 633.775500] ? srso_alias_return_thunk+0x5/0xfbef5
[ 633.775502] __walk_page_range+0x198/0x1b0
[ 633.775505] walk_page_range+0x13d/0x200
[ 633.775508] hmm_range_fault+0x5f/0xa0
[ 633.775513] amdgpu_hmm_range_get_pages+0x144/0x260 [amdgpu]
[ 633.775717] amdgpu_ttm_tt_get_user_pages+0xc1/0x1a0 [amdgpu]
[ 633.775838] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x50e/0xb40 [amdgpu]
[ 633.775991] kfd_ioctl_alloc_memory_of_gpu+0xd5/0x270 [amdgpu]
[ 633.776139] kfd_ioctl+0x3af/0x4c0 [amdgpu]
[ 633.776280] ? __pfx_kfd_ioctl_alloc_memory_of_gpu+0x10/0x10 [amdgpu]
[ 633.776422] __x64_sys_ioctl+0x97/0xd0
[ 633.776426] do_syscall_64+0x82/0x190
[ 633.776432] ? srso_alias_return_thunk+0x5/0xfbef5
[ 633.776434] ? vm_mmap_pgoff+0x131/0x1c0
[ 633.776437] ? srso_alias_return_thunk+0x5/0xfbef5
[ 633.776439] ? syscall_exit_to_user_mode+0x77/0x210
[ 633.776441] ? srso_alias_return_thunk+0x5/0xfbef5
[ 633.776443] ? do_syscall_64+0x8e/0x190
[ 633.776445] ? srso_alias_return_thunk+0x5/0xfbef5
[ 633.776446] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 633.776450] RIP: 0033:0x7fe8bf3164bb
[ 633.776454] Code: Unable to access opcode bytes at 0x7fe8bf316491.
[ 633.776455] RSP: 002b:00007ffd9f7cddb0 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 633.776457] RAX: ffffffffffffffda RBX: 00000000c0000004 RCX: 00007fe8bf3164bb
[ 633.776459] RDX: 00007ffd9f7cde50 RSI: 00000000c0284b16 RDI: 0000000000000003
[ 633.776460] RBP: 00007ffd9f7cde50 R08: 00007ffd9f7cdf48 R09: 00000000c0000004
[ 633.776461] R10: 0000000000004022 R11: 0000000000000246 R12: 00000000c0284b16
[ 633.776462] R13: 0000000000000003 R14: 00007ffd9f7cdf48 R15: 00007fe8c4a71278
[ 633.776466] </TASK>
[ 633.776467] Mem-Info:
[ 633.776470] active_anon:10824089 inactive_anon:666438 isolated_anon:0
active_file:82 inactive_file:164 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:7541 slab_unreclaimable:23829
mapped:44 shmem:294 pagetables:24401
sec_pagetables:1054 bounce:0
kernel_misc_reclaimable:0
free:37599 free_pcp:19 free_cma:0
[ 633.776473] Node 0 active_anon:43296924kB inactive_anon:2665184kB
active_file:328kB inactive_file:656kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB mapped:176kB dirty:0kB writeback:0kB shmem:1176kB
shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:44812288kB writeback_tmp:0kB
kernel_stack:5824kB pagetables:97604kB sec_pagetables:4216kB all_unreclaimable?
no
[ 633.776477] Node 0 DMA free:824kB boost:0kB min:16kB low:28kB high:40kB
reserved_highatomic:0KB active_anon:80kB inactive_anon:10336kB active_file:0kB
inactive_file:24kB unevictable:0kB writepending:0kB present:15996kB
managed:15360kB mlocked:0kB bounce:0kB free_pcp:8kB local_pcp:0kB free_cma:0kB
[ 633.776481] lowmem_reserve[]: 0 2725 61975 0 0
[ 633.776486] Node 0 DMA32 free:2928kB boost:0kB min:2968kB low:5756kB
high:8544kB reserved_highatomic:0KB active_anon:264kB inactive_anon:237144kB
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
present:2895008kB managed:2829172kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
[ 633.776490] lowmem_reserve[]: 0 0 59249 0 0
[ 633.776494] Node 0 Normal free:146644kB boost:194924kB min:259516kB
low:320184kB high:380852kB reserved_highatomic:0KB active_anon:43297812kB
inactive_anon:2416472kB active_file:524kB inactive_file:16kB unevictable:0kB
writepending:0kB present:61836288kB managed:60679192kB mlocked:0kB bounce:0kB
free_pcp:68kB local_pcp:0kB free_cma:0kB
[ 633.776498] lowmem_reserve[]: 0 0 0 0 0
[ 633.776501] Node 0 DMA: 2*4kB (U) 5*8kB (U) 16*16kB (U) 16*32kB (U) 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 816kB
[ 633.776513] Node 0 DMA32: 0*4kB 0*8kB 1*16kB (U) 2*32kB (UM) 1*64kB (M)
1*128kB (U) 1*256kB (U) 1*512kB (U) 2*1024kB (UM) 0*2048kB 0*4096kB = 3088kB
[ 633.776525] Node 0 Normal: 1608*4kB (UE) 2125*8kB (UE) 2767*16kB (UE)
2436*32kB (UE) 6*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =
146040kB
[ 633.776537] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=1048576kB
[ 633.776538] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=2048kB
[ 633.776540] 883 total pagecache pages
[ 633.776541] 341 pages in swap cache
[ 633.776542] Free swap = 0kB
[ 633.776542] Total swap = 999420kB
[ 633.776543] 16186823 pages RAM
[ 633.776544] 0 pages HighMem/MovableOnly
[ 633.776545] 305892 pages reserved
[ 633.776545] 0 pages hwpoisoned
[ 633.776546] Tasks state (memory values in pages):
[ 633.776547] [ pid ] uid tgid total_vm rss rss_anon rss_file
rss_shmem pgtables_bytes swapents oom_score_adj name
[ 633.776555] [ 434] 0 434 24307 224 224 0
0 221184 0 -250 systemd-journal
[ 633.776559] [ 455] 0 455 6760 443 384 59
0 77824 0 -1000 systemd-udevd
[ 633.776562] [ 647] 997 647 22520 286 224 62
0 81920 0 0 systemd-timesyn
[ 633.776564] [ 666] 0 666 1468 226 160 66
0 53248 0 0 dhclient
[ 633.776566] [ 685] 0 685 1652 51 0 51
0 57344 0 0 cron
[ 633.776568] [ 686] 100 686 2309 64 64 0
0 57344 0 -900 dbus-daemon
[ 633.776571] [ 688] 0 688 6202 2453 2432 21
0 81920 0 0 gpuenv-server
[ 633.776573] [ 690] 0 690 38186 61 32 29
0 69632 0 -1000 lxcfs
[ 633.776575] [ 692] 0 692 55447 286 224 62
0 94208 0 0 rsyslogd
[ 633.776577] [ 693] 0 693 12451 250 224 26
0 106496 0 0 systemd-logind
[ 633.776579] [ 698] 104 698 644 38 0 38
0 40960 0 0 debci-publisher
[ 633.776581] [ 715] 0 715 1458 79 32 47
0 49152 0 0 lxc-monitord
[ 633.776584] [ 721] 0 721 1468 82 32 50
0 53248 0 0 agetty
[ 633.776586] [ 724] 0 724 541388 3540 3540 0
0 442368 288 -999 containerd
[ 633.776588] [ 771] 0 771 3858 379 288 91
0 69632 0 -1000 sshd
[ 633.776590] [ 777] 104 777 4786 320 320 0
0 77824 0 100 systemd
[ 633.776592] [ 815] 104 815 669 37 0 37
0 45056 0 0 inotifywait
[ 633.776595] [ 820] 104 820 42286 861 791 70
0 98304 0 100 (sd-pam)
[ 633.776597] [ 897] 0 897 568107 2028 2028 0
0 557056 5728 0 dockerd
[ 633.776600] [ 962] 103 962 3544 138 104 34
0 65536 0 0 dnsmasq
[ 633.776602] [ 1188] 0 1188 4505 396 384 12
0 69632 0 0 sshd
[ 633.776604] [ 1191] 1000 1191 4764 487 448 39
0 77824 0 100 systemd
[ 633.776606] [ 1192] 1000 1192 42286 916 829 87
0 98304 0 100 (sd-pam)
[ 633.776609] [ 1211] 1000 1211 4570 383 328 55
0 73728 128 0 sshd
[ 633.776611] [ 1212] 1000 1212 2058 251 192 59
0 57344 192 0 bash
[ 633.776613] [ 1299] 1000 1299 2577 52 32 20
0 61440 64 0 sudo
[ 633.776616] [ 1300] 1000 1300 2577 94 37 57
0 57344 64 0 sudo
[ 633.776618] [ 1301] 0 1301 2325 46 0 46
0 61440 96 0 su
[ 633.776620] [ 1302] 104 1302 644 109 32 77
0 49152 0 0 sh
[ 633.776622] [ 1305] 104 1305 9771 501 432 69
0 118784 4064 0 autopkgtest
[ 633.776625] [ 1306] 104 1306 1367 61 0 61
0 49152 0 0 tee
[ 633.776627] [ 1307] 104 1307 1367 71 0 71
0 53248 0 0 tee
[ 633.776629] [ 1308] 104 1308 1637 64 32 32
0 57344 0 0 mawk
[ 633.776631] [ 1309] 104 1309 5649 436 311 125
0 81920 1824 0 autopkgtest-vir
[ 633.776633] [ 1339] 104 1339 268 59 0 59
0 40960 0 0 catatonit
[ 633.776635] [ 1361] 104 1361 2247 113 96 17
0 53248 0 200 dbus-daemon
[ 633.776637] [ 1364] 104 1364 2140 1366 1315 51
0 57344 0 0 fuse-overlayfs
[ 633.776640] [ 1367] 104 1367 2212 72 67 5
0 57344 0 0 conmon
[ 633.776642] [ 1370] 104 1370 645 42 0 42
0 49152 0 0 sleep
[ 633.776644] [ 1372] 104 1372 1480 106 56 50
0 53248 0 0 slirp4netns
[ 633.776646] [ 3979] 104 3979 446441 304 304 0
0 368640 1984 0 podman
[ 633.776648] [ 4002] 104 4002 502124 3983 3978 5
0 401408 0 0 podman
[ 633.776651] [ 4022] 104 4022 2212 127 68 59
0 53248 0 0 conmon
[ 633.776653] [ 4025] 104 4025 1015 98 64 34
0 57344 0 0 bash
[ 633.776655] [ 4028] 104 4028 1069 50 32 18
0 45056 32 0 su
[ 633.776657] [ 4029] 166536 4029 669 52 32 20
0 45056 0 0 wrapper.sh
[ 633.776659] [ 4041] 166536 4041 754 36 0 36
0 40960 0 0 tee
[ 633.776661] [ 4043] 166536 4043 754 34 0 34
0 45056 0 0 tee
[ 633.776663] [ 4045] 166536 4045 669 37 0 37
0 49152 0 0 sh
[ 633.776666] [ 4053] 166536 4053 20134547 11468505 11468495 10
0 94818304 234720 0 rocfft-test
[ 633.776668]
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-104.slice/user@104.service/user.slice/libpod-577c53714068510df069335c5a4e99b966e187f291e087208c17df3ad5fdb52d.scope/container,task=rocfft-test,pid=4053,uid=166536
[ 633.776686] Out of memory: Killed process 4053 (rocfft-test)
total-vm:80538188kB, anon-rss:45873980kB, file-rss:40kB, shmem-rss:0kB,
UID:166536 pgtables:92596kB oom_score_adj:0
[ 633.779547] rocfft-test: page allocation failure: order:0,
mode:0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO),
nodemask=(null),cpuset=user.slice,mems_allowed=0
[ 633.779554] CPU: 7 PID: 4053 Comm: rocfft-test Not tainted 6.10.6+bpo-amd64
#1 Debian 6.10.6-1~bpo12+1
[ 633.779556] Hardware name: Micro Computer (HK) Tech Limited UM773
Lite/F7BFD, BIOS 1.06 02/27/2023
[ 633.779557] Call Trace:
[ 633.779559] <TASK>
[ 633.779560] dump_stack_lvl+0x64/0x80
[ 633.779563] warn_alloc+0x164/0x1e0
[ 633.779567] ? srso_alias_return_thunk+0x5/0xfbef5
[ 633.779570] __alloc_pages_slowpath.constprop.0+0xc7b/0xd60
[ 633.779575] __alloc_pages_noprof+0x309/0x340
[ 633.779578] alloc_pages_mpol_noprof+0xd9/0x1e0
[ 633.779582] vma_alloc_folio_noprof+0x65/0xd0
[ 633.779584] ? srso_alias_return_thunk+0x5/0xfbef5
[ 633.779586] do_anonymous_page+0x2b0/0x7b0
[ 633.779588] ? srso_alias_return_thunk+0x5/0xfbef5
[ 633.779590] ? __pte_offset_map+0x1b/0x180
[ 633.779593] __handle_mm_fault+0xc3e/0x1070
[ 633.779596] ? srso_alias_return_thunk+0x5/0xfbef5
[ 633.779600] handle_mm_fault+0x190/0x320
[ 633.779602] hmm_vma_fault.isra.0+0x4d/0x90
[ 633.779605] walk_pgd_range+0x34d/0xa90
[ 633.779609] ? srso_alias_return_thunk+0x5/0xfbef5
[ 633.779612] __walk_page_range+0x198/0x1b0
[ 633.779615] walk_page_range+0x13d/0x200
[ 633.779618] hmm_range_fault+0x5f/0xa0
[ 633.779621] amdgpu_hmm_range_get_pages+0x144/0x260 [amdgpu]
[ 633.779791] amdgpu_ttm_tt_get_user_pages+0xc1/0x1a0 [amdgpu]
[ 633.779912] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x50e/0xb40 [amdgpu]
[ 633.780064] kfd_ioctl_alloc_memory_of_gpu+0xd5/0x270 [amdgpu]
[ 633.780211] kfd_ioctl+0x3af/0x4c0 [amdgpu]
[ 633.780352] ? __pfx_kfd_ioctl_alloc_memory_of_gpu+0x10/0x10 [amdgpu]
[ 633.780495] __x64_sys_ioctl+0x97/0xd0
[ 633.780498] do_syscall_64+0x82/0x190
[ 633.780503] ? srso_alias_return_thunk+0x5/0xfbef5
[ 633.780505] ? vm_mmap_pgoff+0x131/0x1c0
[ 633.780508] ? srso_alias_return_thunk+0x5/0xfbef5
[ 633.780509] ? syscall_exit_to_user_mode+0x77/0x210
[ 633.780512] ? srso_alias_return_thunk+0x5/0xfbef5
[ 633.780513] ? do_syscall_64+0x8e/0x190
[ 633.780515] ? srso_alias_return_thunk+0x5/0xfbef5
[ 633.780517] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 633.780519] RIP: 0033:0x7fe8bf3164bb
[ 633.780522] Code: Unable to access opcode bytes at 0x7fe8bf316491.
[ 633.780523] RSP: 002b:00007ffd9f7cddb0 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 633.780525] RAX: ffffffffffffffda RBX: 00000000c0000004 RCX: 00007fe8bf3164bb
[ 633.780526] RDX: 00007ffd9f7cde50 RSI: 00000000c0284b16 RDI: 0000000000000003
[ 633.780527] RBP: 00007ffd9f7cde50 R08: 00007ffd9f7cdf48 R09: 00000000c0000004
[ 633.780528] R10: 0000000000004022 R11: 0000000000000246 R12: 00000000c0284b16
[ 633.780529] R13: 0000000000000003 R14: 00007ffd9f7cdf48 R15: 00007fe8c4a71278
[ 633.780533] </TASK>
[ 633.780534] Mem-Info:
[ 633.780536] active_anon:10825835 inactive_anon:665439 isolated_anon:0
active_file:10 inactive_file:309 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:7541 slab_unreclaimable:23829
mapped:47 shmem:294 pagetables:24401
sec_pagetables:1054 bounce:0
kernel_misc_reclaimable:0
free:34054 free_pcp:2971 free_cma:0
[ 633.780539] Node 0 active_anon:43303340kB inactive_anon:2661756kB
active_file:40kB inactive_file:1236kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB mapped:188kB dirty:0kB writeback:0kB shmem:1176kB
shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:44812288kB writeback_tmp:0kB
kernel_stack:5824kB pagetables:97604kB sec_pagetables:4216kB all_unreclaimable?
no
[ 633.780543] Node 0 DMA free:756kB boost:0kB min:16kB low:28kB high:40kB
reserved_highatomic:0KB active_anon:96kB inactive_anon:10352kB active_file:28kB
inactive_file:16kB unevictable:0kB writepending:0kB present:15996kB
managed:15360kB mlocked:0kB bounce:0kB free_pcp:16kB local_pcp:8kB free_cma:0kB
[ 633.780547] lowmem_reserve[]: 0 2725 61975 0 0
[ 633.780552] Node 0 DMA32 free:1424kB boost:0kB min:2968kB low:5756kB
high:8544kB reserved_highatomic:0KB active_anon:0kB inactive_anon:238268kB
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
present:2895008kB managed:2829172kB mlocked:0kB bounce:0kB free_pcp:752kB
local_pcp:752kB free_cma:0kB
[ 633.780556] lowmem_reserve[]: 0 0 59249 0 0
[ 633.780560] Node 0 Normal free:134036kB boost:0kB min:64592kB low:125260kB
high:185928kB reserved_highatomic:0KB active_anon:44073220kB
inactive_anon:1643160kB active_file:0kB inactive_file:880kB unevictable:0kB
writepending:0kB present:61836288kB managed:60679192kB mlocked:0kB bounce:0kB
free_pcp:11364kB local_pcp:7572kB free_cma:0kB
[ 633.780563] lowmem_reserve[]: 0 0 0 0 0
[ 633.780567] Node 0 DMA: 1*4kB (U) 0*8kB 14*16kB (U) 16*32kB (U) 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 740kB
[ 633.780578] Node 0 DMA32: 1*4kB (M) 0*8kB 0*16kB 0*32kB 1*64kB (M) 1*128kB
(M) 1*256kB (M) 0*512kB 1*1024kB (M) 0*2048kB 0*4096kB = 1476kB
[ 633.780589] Node 0 Normal: 1*4kB (U) 1247*8kB (U) 2773*16kB (UE) 2442*32kB
(UE) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 132492kB
[ 633.780601] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=1048576kB
[ 633.780602] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=2048kB
[ 633.780603] 891 total pagecache pages
[ 633.780604] 342 pages in swap cache
[ 633.780605] Free swap = 0kB
[ 633.780606] Total swap = 999420kB
[ 633.780606] 16186823 pages RAM
[ 633.780607] 0 pages HighMem/MovableOnly
[ 633.780608] 305892 pages reserved
[ 633.780609] 0 pages hwpoisoned
[ 633.780633] amdgpu: init_user_pages: Failed to get user pages: -14