Hi Christian,

On 2024-09-27 16:11, Christian Kastner wrote:
Am I interpreting this right that the "Killed" disappeared? If so, then the 
issue should be reproducible by re-enabling vm.overcommit_memory=0.

"Killed" disappeared when I ran it myself in both cases. However, it did get further with vm.overcommit_memory=0:

[ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_67108864_single_op_batch_1_istride_1_CI_ostride_1_CI_idist_67108864_odist_67108864_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_67108864_single_op_batch_1_istride_1_CI_ostride_1_CI_idist_67108864_odist_67108864_ioffset_0_0_ooffset_0_0 (881 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 (76872 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 (11141 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 (5230 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 (5429 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_2_istride_1_CI_ostride_1_CP_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_2_istride_1_CI_ostride_1_CP_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 (6498 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_ip_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 (2630 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_double_op_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 (2718 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 (8447 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_4_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 (7018 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 (3510 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_2_istride_1_CP_ostride_1_CP_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_2_istride_1_CP_ostride_1_CP_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 (4090 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_2_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 (3520 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_ip_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 (1766 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_134217728_single_op_batch_1_istride_1_CI_ostride_1_CI_idist_134217728_odist_134217728_ioffset_0_0_ooffset_0_0 (1771 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_4_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
clients/tests/accuracy_test.h:1214: Skipped
needed_ramgb: 96, ramgb limit: 61.


[  SKIPPED ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_4_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0 (0 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_4_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
clients/tests/accuracy_test.h:1214: Skipped
needed_ramgb: 96, ramgb limit: 61.


[  SKIPPED ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_4_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0 (0 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_4_istride_1_CP_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
clients/tests/accuracy_test.h:1214: Skipped
needed_ramgb: 96, ramgb limit: 61.


[  SKIPPED ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_4_istride_1_CP_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0 (0 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_2_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_2_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0 (67340 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_2_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_2_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0 (11059 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_2_istride_1_CI_ostride_1_CP_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_2_istride_1_CI_ostride_1_CP_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0 (12243 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_1_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_ip_batch_1_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0 (5412 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_1_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0 [       OK ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_double_op_batch_1_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0 (5695 ms) [ RUN      ] pow2_1D/accuracy_test.vs_fftw/complex_forward_len_268435456_single_ip_batch_4_istride_1_CI_ostride_1_CI_idist_268435456_odist_268435456_ioffset_0_0_ooffset_0_0
command1             FAIL non-zero exit status 1

    Pool Info:
      Pool 1
        Segment:                 GLOBAL; FLAGS: COARSE GRAINED
-      Size:                    2097152(0x200000) KB
+      Size:                    31761860(0x1e4a5c4) KB
This is the pool from the gfx1035. It increased in size from 2GiB to ~32GiB.

If overcommit was indeed the issue behind "Killed", then I suspect that the 
test malloc'ed so much such that it eventually triggered the OOM when both test and GPU 
consumed all physical memory, eg: with a 32GiB large test case computed on both GPU and 
CPU for expected/actual comparison.
The dmesg logs indicate the oom killer activating with vm.overcommit_memory=0:

[  633.775419] rocfft-test invoked oom-killer: gfp_mask=0x440dc0(GFP_KERNEL_ACCOUNT|__GFP_COMP|__GFP_ZERO), order=0, oom_score_adj=0

I've attached the rest of the dmesg log for the test. It has more details.

Sincerely,
Cory Bloor
[  633.775419] rocfft-test invoked oom-killer: 
gfp_mask=0x440dc0(GFP_KERNEL_ACCOUNT|__GFP_COMP|__GFP_ZERO), order=0, 
oom_score_adj=0
[  633.775426] CPU: 7 PID: 4053 Comm: rocfft-test Not tainted 6.10.6+bpo-amd64 
#1  Debian 6.10.6-1~bpo12+1
[  633.775429] Hardware name: Micro Computer (HK) Tech Limited UM773 
Lite/F7BFD, BIOS 1.06 02/27/2023
[  633.775430] Call Trace:
[  633.775432]  <TASK>
[  633.775434]  dump_stack_lvl+0x64/0x80
[  633.775440]  dump_header+0x44/0x1b0
[  633.775444]  oom_kill_process+0xfa/0x200
[  633.775447]  out_of_memory+0x257/0x520
[  633.775450]  __alloc_pages_slowpath.constprop.0+0xaaa/0xd60
[  633.775456]  __alloc_pages_noprof+0x309/0x340
[  633.775460]  alloc_pages_mpol_noprof+0xd9/0x1e0
[  633.775464]  pte_alloc_one+0x1d/0x60
[  633.775468]  __pte_alloc+0x2a/0xb0
[  633.775472]  do_anonymous_page+0x52b/0x7b0
[  633.775474]  ? lruvec_stat_mod_folio.constprop.0+0x1c/0x30
[  633.775476]  ? srso_alias_return_thunk+0x5/0xfbef5
[  633.775479]  ? __pmd_alloc+0x148/0x200
[  633.775481]  __handle_mm_fault+0xc3e/0x1070
[  633.775484]  ? srso_alias_return_thunk+0x5/0xfbef5
[  633.775488]  handle_mm_fault+0x190/0x320
[  633.775491]  hmm_vma_fault.isra.0+0x4d/0x90
[  633.775495]  walk_pgd_range+0x34d/0xa90
[  633.775500]  ? srso_alias_return_thunk+0x5/0xfbef5
[  633.775502]  __walk_page_range+0x198/0x1b0
[  633.775505]  walk_page_range+0x13d/0x200
[  633.775508]  hmm_range_fault+0x5f/0xa0
[  633.775513]  amdgpu_hmm_range_get_pages+0x144/0x260 [amdgpu]
[  633.775717]  amdgpu_ttm_tt_get_user_pages+0xc1/0x1a0 [amdgpu]
[  633.775838]  amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x50e/0xb40 [amdgpu]
[  633.775991]  kfd_ioctl_alloc_memory_of_gpu+0xd5/0x270 [amdgpu]
[  633.776139]  kfd_ioctl+0x3af/0x4c0 [amdgpu]
[  633.776280]  ? __pfx_kfd_ioctl_alloc_memory_of_gpu+0x10/0x10 [amdgpu]
[  633.776422]  __x64_sys_ioctl+0x97/0xd0
[  633.776426]  do_syscall_64+0x82/0x190
[  633.776432]  ? srso_alias_return_thunk+0x5/0xfbef5
[  633.776434]  ? vm_mmap_pgoff+0x131/0x1c0
[  633.776437]  ? srso_alias_return_thunk+0x5/0xfbef5
[  633.776439]  ? syscall_exit_to_user_mode+0x77/0x210
[  633.776441]  ? srso_alias_return_thunk+0x5/0xfbef5
[  633.776443]  ? do_syscall_64+0x8e/0x190
[  633.776445]  ? srso_alias_return_thunk+0x5/0xfbef5
[  633.776446]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  633.776450] RIP: 0033:0x7fe8bf3164bb
[  633.776454] Code: Unable to access opcode bytes at 0x7fe8bf316491.
[  633.776455] RSP: 002b:00007ffd9f7cddb0 EFLAGS: 00000246 ORIG_RAX: 
0000000000000010
[  633.776457] RAX: ffffffffffffffda RBX: 00000000c0000004 RCX: 00007fe8bf3164bb
[  633.776459] RDX: 00007ffd9f7cde50 RSI: 00000000c0284b16 RDI: 0000000000000003
[  633.776460] RBP: 00007ffd9f7cde50 R08: 00007ffd9f7cdf48 R09: 00000000c0000004
[  633.776461] R10: 0000000000004022 R11: 0000000000000246 R12: 00000000c0284b16
[  633.776462] R13: 0000000000000003 R14: 00007ffd9f7cdf48 R15: 00007fe8c4a71278
[  633.776466]  </TASK>
[  633.776467] Mem-Info:
[  633.776470] active_anon:10824089 inactive_anon:666438 isolated_anon:0
                active_file:82 inactive_file:164 isolated_file:0
                unevictable:0 dirty:0 writeback:0
                slab_reclaimable:7541 slab_unreclaimable:23829
                mapped:44 shmem:294 pagetables:24401
                sec_pagetables:1054 bounce:0
                kernel_misc_reclaimable:0
                free:37599 free_pcp:19 free_cma:0
[  633.776473] Node 0 active_anon:43296924kB inactive_anon:2665184kB 
active_file:328kB inactive_file:656kB unevictable:0kB isolated(anon):0kB 
isolated(file):0kB mapped:176kB dirty:0kB writeback:0kB shmem:1176kB 
shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:44812288kB writeback_tmp:0kB 
kernel_stack:5824kB pagetables:97604kB sec_pagetables:4216kB all_unreclaimable? 
no
[  633.776477] Node 0 DMA free:824kB boost:0kB min:16kB low:28kB high:40kB 
reserved_highatomic:0KB active_anon:80kB inactive_anon:10336kB active_file:0kB 
inactive_file:24kB unevictable:0kB writepending:0kB present:15996kB 
managed:15360kB mlocked:0kB bounce:0kB free_pcp:8kB local_pcp:0kB free_cma:0kB
[  633.776481] lowmem_reserve[]: 0 2725 61975 0 0
[  633.776486] Node 0 DMA32 free:2928kB boost:0kB min:2968kB low:5756kB 
high:8544kB reserved_highatomic:0KB active_anon:264kB inactive_anon:237144kB 
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB 
present:2895008kB managed:2829172kB mlocked:0kB bounce:0kB free_pcp:0kB 
local_pcp:0kB free_cma:0kB
[  633.776490] lowmem_reserve[]: 0 0 59249 0 0
[  633.776494] Node 0 Normal free:146644kB boost:194924kB min:259516kB 
low:320184kB high:380852kB reserved_highatomic:0KB active_anon:43297812kB 
inactive_anon:2416472kB active_file:524kB inactive_file:16kB unevictable:0kB 
writepending:0kB present:61836288kB managed:60679192kB mlocked:0kB bounce:0kB 
free_pcp:68kB local_pcp:0kB free_cma:0kB
[  633.776498] lowmem_reserve[]: 0 0 0 0 0
[  633.776501] Node 0 DMA: 2*4kB (U) 5*8kB (U) 16*16kB (U) 16*32kB (U) 0*64kB 
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 816kB
[  633.776513] Node 0 DMA32: 0*4kB 0*8kB 1*16kB (U) 2*32kB (UM) 1*64kB (M) 
1*128kB (U) 1*256kB (U) 1*512kB (U) 2*1024kB (UM) 0*2048kB 0*4096kB = 3088kB
[  633.776525] Node 0 Normal: 1608*4kB (UE) 2125*8kB (UE) 2767*16kB (UE) 
2436*32kB (UE) 6*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 
146040kB
[  633.776537] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=1048576kB
[  633.776538] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=2048kB
[  633.776540] 883 total pagecache pages
[  633.776541] 341 pages in swap cache
[  633.776542] Free swap  = 0kB
[  633.776542] Total swap = 999420kB
[  633.776543] 16186823 pages RAM
[  633.776544] 0 pages HighMem/MovableOnly
[  633.776545] 305892 pages reserved
[  633.776545] 0 pages hwpoisoned
[  633.776546] Tasks state (memory values in pages):
[  633.776547] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file 
rss_shmem pgtables_bytes swapents oom_score_adj name
[  633.776555] [    434]     0   434    24307      224      224        0        
 0   221184        0          -250 systemd-journal
[  633.776559] [    455]     0   455     6760      443      384       59        
 0    77824        0         -1000 systemd-udevd
[  633.776562] [    647]   997   647    22520      286      224       62        
 0    81920        0             0 systemd-timesyn
[  633.776564] [    666]     0   666     1468      226      160       66        
 0    53248        0             0 dhclient
[  633.776566] [    685]     0   685     1652       51        0       51        
 0    57344        0             0 cron
[  633.776568] [    686]   100   686     2309       64       64        0        
 0    57344        0          -900 dbus-daemon
[  633.776571] [    688]     0   688     6202     2453     2432       21        
 0    81920        0             0 gpuenv-server
[  633.776573] [    690]     0   690    38186       61       32       29        
 0    69632        0         -1000 lxcfs
[  633.776575] [    692]     0   692    55447      286      224       62        
 0    94208        0             0 rsyslogd
[  633.776577] [    693]     0   693    12451      250      224       26        
 0   106496        0             0 systemd-logind
[  633.776579] [    698]   104   698      644       38        0       38        
 0    40960        0             0 debci-publisher
[  633.776581] [    715]     0   715     1458       79       32       47        
 0    49152        0             0 lxc-monitord
[  633.776584] [    721]     0   721     1468       82       32       50        
 0    53248        0             0 agetty
[  633.776586] [    724]     0   724   541388     3540     3540        0        
 0   442368      288          -999 containerd
[  633.776588] [    771]     0   771     3858      379      288       91        
 0    69632        0         -1000 sshd
[  633.776590] [    777]   104   777     4786      320      320        0        
 0    77824        0           100 systemd
[  633.776592] [    815]   104   815      669       37        0       37        
 0    45056        0             0 inotifywait
[  633.776595] [    820]   104   820    42286      861      791       70        
 0    98304        0           100 (sd-pam)
[  633.776597] [    897]     0   897   568107     2028     2028        0        
 0   557056     5728             0 dockerd
[  633.776600] [    962]   103   962     3544      138      104       34        
 0    65536        0             0 dnsmasq
[  633.776602] [   1188]     0  1188     4505      396      384       12        
 0    69632        0             0 sshd
[  633.776604] [   1191]  1000  1191     4764      487      448       39        
 0    77824        0           100 systemd
[  633.776606] [   1192]  1000  1192    42286      916      829       87        
 0    98304        0           100 (sd-pam)
[  633.776609] [   1211]  1000  1211     4570      383      328       55        
 0    73728      128             0 sshd
[  633.776611] [   1212]  1000  1212     2058      251      192       59        
 0    57344      192             0 bash
[  633.776613] [   1299]  1000  1299     2577       52       32       20        
 0    61440       64             0 sudo
[  633.776616] [   1300]  1000  1300     2577       94       37       57        
 0    57344       64             0 sudo
[  633.776618] [   1301]     0  1301     2325       46        0       46        
 0    61440       96             0 su
[  633.776620] [   1302]   104  1302      644      109       32       77        
 0    49152        0             0 sh
[  633.776622] [   1305]   104  1305     9771      501      432       69        
 0   118784     4064             0 autopkgtest
[  633.776625] [   1306]   104  1306     1367       61        0       61        
 0    49152        0             0 tee
[  633.776627] [   1307]   104  1307     1367       71        0       71        
 0    53248        0             0 tee
[  633.776629] [   1308]   104  1308     1637       64       32       32        
 0    57344        0             0 mawk
[  633.776631] [   1309]   104  1309     5649      436      311      125        
 0    81920     1824             0 autopkgtest-vir
[  633.776633] [   1339]   104  1339      268       59        0       59        
 0    40960        0             0 catatonit
[  633.776635] [   1361]   104  1361     2247      113       96       17        
 0    53248        0           200 dbus-daemon
[  633.776637] [   1364]   104  1364     2140     1366     1315       51        
 0    57344        0             0 fuse-overlayfs
[  633.776640] [   1367]   104  1367     2212       72       67        5        
 0    57344        0             0 conmon
[  633.776642] [   1370]   104  1370      645       42        0       42        
 0    49152        0             0 sleep
[  633.776644] [   1372]   104  1372     1480      106       56       50        
 0    53248        0             0 slirp4netns
[  633.776646] [   3979]   104  3979   446441      304      304        0        
 0   368640     1984             0 podman
[  633.776648] [   4002]   104  4002   502124     3983     3978        5        
 0   401408        0             0 podman
[  633.776651] [   4022]   104  4022     2212      127       68       59        
 0    53248        0             0 conmon
[  633.776653] [   4025]   104  4025     1015       98       64       34        
 0    57344        0             0 bash
[  633.776655] [   4028]   104  4028     1069       50       32       18        
 0    45056       32             0 su
[  633.776657] [   4029] 166536  4029      669       52       32       20       
  0    45056        0             0 wrapper.sh
[  633.776659] [   4041] 166536  4041      754       36        0       36       
  0    40960        0             0 tee
[  633.776661] [   4043] 166536  4043      754       34        0       34       
  0    45056        0             0 tee
[  633.776663] [   4045] 166536  4045      669       37        0       37       
  0    49152        0             0 sh
[  633.776666] [   4053] 166536  4053 20134547 11468505 11468495       10       
  0 94818304   234720             0 rocfft-test
[  633.776668] 
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-104.slice/user@104.service/user.slice/libpod-577c53714068510df069335c5a4e99b966e187f291e087208c17df3ad5fdb52d.scope/container,task=rocfft-test,pid=4053,uid=166536
[  633.776686] Out of memory: Killed process 4053 (rocfft-test) 
total-vm:80538188kB, anon-rss:45873980kB, file-rss:40kB, shmem-rss:0kB, 
UID:166536 pgtables:92596kB oom_score_adj:0
[  633.779547] rocfft-test: page allocation failure: order:0, 
mode:0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO), 
nodemask=(null),cpuset=user.slice,mems_allowed=0
[  633.779554] CPU: 7 PID: 4053 Comm: rocfft-test Not tainted 6.10.6+bpo-amd64 
#1  Debian 6.10.6-1~bpo12+1
[  633.779556] Hardware name: Micro Computer (HK) Tech Limited UM773 
Lite/F7BFD, BIOS 1.06 02/27/2023
[  633.779557] Call Trace:
[  633.779559]  <TASK>
[  633.779560]  dump_stack_lvl+0x64/0x80
[  633.779563]  warn_alloc+0x164/0x1e0
[  633.779567]  ? srso_alias_return_thunk+0x5/0xfbef5
[  633.779570]  __alloc_pages_slowpath.constprop.0+0xc7b/0xd60
[  633.779575]  __alloc_pages_noprof+0x309/0x340
[  633.779578]  alloc_pages_mpol_noprof+0xd9/0x1e0
[  633.779582]  vma_alloc_folio_noprof+0x65/0xd0
[  633.779584]  ? srso_alias_return_thunk+0x5/0xfbef5
[  633.779586]  do_anonymous_page+0x2b0/0x7b0
[  633.779588]  ? srso_alias_return_thunk+0x5/0xfbef5
[  633.779590]  ? __pte_offset_map+0x1b/0x180
[  633.779593]  __handle_mm_fault+0xc3e/0x1070
[  633.779596]  ? srso_alias_return_thunk+0x5/0xfbef5
[  633.779600]  handle_mm_fault+0x190/0x320
[  633.779602]  hmm_vma_fault.isra.0+0x4d/0x90
[  633.779605]  walk_pgd_range+0x34d/0xa90
[  633.779609]  ? srso_alias_return_thunk+0x5/0xfbef5
[  633.779612]  __walk_page_range+0x198/0x1b0
[  633.779615]  walk_page_range+0x13d/0x200
[  633.779618]  hmm_range_fault+0x5f/0xa0
[  633.779621]  amdgpu_hmm_range_get_pages+0x144/0x260 [amdgpu]
[  633.779791]  amdgpu_ttm_tt_get_user_pages+0xc1/0x1a0 [amdgpu]
[  633.779912]  amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x50e/0xb40 [amdgpu]
[  633.780064]  kfd_ioctl_alloc_memory_of_gpu+0xd5/0x270 [amdgpu]
[  633.780211]  kfd_ioctl+0x3af/0x4c0 [amdgpu]
[  633.780352]  ? __pfx_kfd_ioctl_alloc_memory_of_gpu+0x10/0x10 [amdgpu]
[  633.780495]  __x64_sys_ioctl+0x97/0xd0
[  633.780498]  do_syscall_64+0x82/0x190
[  633.780503]  ? srso_alias_return_thunk+0x5/0xfbef5
[  633.780505]  ? vm_mmap_pgoff+0x131/0x1c0
[  633.780508]  ? srso_alias_return_thunk+0x5/0xfbef5
[  633.780509]  ? syscall_exit_to_user_mode+0x77/0x210
[  633.780512]  ? srso_alias_return_thunk+0x5/0xfbef5
[  633.780513]  ? do_syscall_64+0x8e/0x190
[  633.780515]  ? srso_alias_return_thunk+0x5/0xfbef5
[  633.780517]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  633.780519] RIP: 0033:0x7fe8bf3164bb
[  633.780522] Code: Unable to access opcode bytes at 0x7fe8bf316491.
[  633.780523] RSP: 002b:00007ffd9f7cddb0 EFLAGS: 00000246 ORIG_RAX: 
0000000000000010
[  633.780525] RAX: ffffffffffffffda RBX: 00000000c0000004 RCX: 00007fe8bf3164bb
[  633.780526] RDX: 00007ffd9f7cde50 RSI: 00000000c0284b16 RDI: 0000000000000003
[  633.780527] RBP: 00007ffd9f7cde50 R08: 00007ffd9f7cdf48 R09: 00000000c0000004
[  633.780528] R10: 0000000000004022 R11: 0000000000000246 R12: 00000000c0284b16
[  633.780529] R13: 0000000000000003 R14: 00007ffd9f7cdf48 R15: 00007fe8c4a71278
[  633.780533]  </TASK>
[  633.780534] Mem-Info:
[  633.780536] active_anon:10825835 inactive_anon:665439 isolated_anon:0
                active_file:10 inactive_file:309 isolated_file:0
                unevictable:0 dirty:0 writeback:0
                slab_reclaimable:7541 slab_unreclaimable:23829
                mapped:47 shmem:294 pagetables:24401
                sec_pagetables:1054 bounce:0
                kernel_misc_reclaimable:0
                free:34054 free_pcp:2971 free_cma:0
[  633.780539] Node 0 active_anon:43303340kB inactive_anon:2661756kB 
active_file:40kB inactive_file:1236kB unevictable:0kB isolated(anon):0kB 
isolated(file):0kB mapped:188kB dirty:0kB writeback:0kB shmem:1176kB 
shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:44812288kB writeback_tmp:0kB 
kernel_stack:5824kB pagetables:97604kB sec_pagetables:4216kB all_unreclaimable? 
no
[  633.780543] Node 0 DMA free:756kB boost:0kB min:16kB low:28kB high:40kB 
reserved_highatomic:0KB active_anon:96kB inactive_anon:10352kB active_file:28kB 
inactive_file:16kB unevictable:0kB writepending:0kB present:15996kB 
managed:15360kB mlocked:0kB bounce:0kB free_pcp:16kB local_pcp:8kB free_cma:0kB
[  633.780547] lowmem_reserve[]: 0 2725 61975 0 0
[  633.780552] Node 0 DMA32 free:1424kB boost:0kB min:2968kB low:5756kB 
high:8544kB reserved_highatomic:0KB active_anon:0kB inactive_anon:238268kB 
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB 
present:2895008kB managed:2829172kB mlocked:0kB bounce:0kB free_pcp:752kB 
local_pcp:752kB free_cma:0kB
[  633.780556] lowmem_reserve[]: 0 0 59249 0 0
[  633.780560] Node 0 Normal free:134036kB boost:0kB min:64592kB low:125260kB 
high:185928kB reserved_highatomic:0KB active_anon:44073220kB 
inactive_anon:1643160kB active_file:0kB inactive_file:880kB unevictable:0kB 
writepending:0kB present:61836288kB managed:60679192kB mlocked:0kB bounce:0kB 
free_pcp:11364kB local_pcp:7572kB free_cma:0kB
[  633.780563] lowmem_reserve[]: 0 0 0 0 0
[  633.780567] Node 0 DMA: 1*4kB (U) 0*8kB 14*16kB (U) 16*32kB (U) 0*64kB 
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 740kB
[  633.780578] Node 0 DMA32: 1*4kB (M) 0*8kB 0*16kB 0*32kB 1*64kB (M) 1*128kB 
(M) 1*256kB (M) 0*512kB 1*1024kB (M) 0*2048kB 0*4096kB = 1476kB
[  633.780589] Node 0 Normal: 1*4kB (U) 1247*8kB (U) 2773*16kB (UE) 2442*32kB 
(UE) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 132492kB
[  633.780601] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=1048576kB
[  633.780602] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=2048kB
[  633.780603] 891 total pagecache pages
[  633.780604] 342 pages in swap cache
[  633.780605] Free swap  = 0kB
[  633.780606] Total swap = 999420kB
[  633.780606] 16186823 pages RAM
[  633.780607] 0 pages HighMem/MovableOnly
[  633.780608] 305892 pages reserved
[  633.780609] 0 pages hwpoisoned
[  633.780633] amdgpu: init_user_pages: Failed to get user pages: -14

Reply via email to