[Bug 1959215] Re: unshare test in ubuntu_stress_smoke_tests triggers "BUG: unable to handle page fault for address: 0000000000001cc8" on Impish with node vought
Found on node vought with F-OEM 5.14.0-1040.44 ** Tags added: 5.14 focal oem sru-20220509 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1959215 Title: unshare test in ubuntu_stress_smoke_tests triggers "BUG: unable to handle page fault for address: 1cc8" on Impish with node vought To manage notifications about this bug go to: https://bugs.launchpad.net/stress-ng/+bug/1959215/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1959215] Re: unshare test in ubuntu_stress_smoke_tests triggers "BUG: unable to handle page fault for address: 0000000000001cc8" on Impish with node vought
Tested with stress-ng V0.13.12 5.17.1-051701-generic - Not OK 5.17.3-051703-generic - Not OK 5.17.5-051705-generic - Not OK 5.18.0-051800rc1-generic - OK [ 51.103647] BUG: unable to handle page fault for address: 1cc8 [ 51.103724] #PF: supervisor read access in kernel mode [ 51.103768] #PF: error_code(0x) - not-present page [ 51.103810] PGD 0 P4D 0 [ 51.103838] Oops: [#1] PREEMPT SMP NOPTI [ 51.103876] CPU: 26 PID: 3277 Comm: stress-ng Not tainted 5.17.5-051705-generic #202204271406 [ 51.103943] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0D.01.0395.022720191340 02/27/2019 [ 51.104020] RIP: 0010:__next_zones_zonelist+0x6/0x50 [ 51.104074] Code: 39 d0 0f 4e d0 3d ff 03 00 00 7f 0e 48 63 d2 5d 48 8b 04 d5 a0 ed e4 85 c3 cc 31 c0 5d c3 cc 0f 1f 44 00 00 0f 1f 44 00 00 55 <8b> 4f 08 48 89 f8 48 89 e5 48 85 d2 75 10 eb 1e 48 63 49 50 48 0f [ 51.104210] RSP: 0018:a28424897aa8 EFLAGS: 00010286 [ 51.104254] RAX: RBX: RCX: [ 51.104309] RDX: 95c7e17b2e00 RSI: 0002 RDI: 1cc0 [ 51.104365] RBP: a28424897b10 R08: R09: [ 51.104419] R10: 0002 R11: fffc R12: 00052cc0 [ 51.104474] R13: 0002 R14: 0001 R15: 00152cc0 [ 51.104528] FS: 7faa2a8b7b80() GS:95f38ca8() knlGS: [ 51.104591] CS: 0010 DS: ES: CR0: 80050033 [ 51.104637] CR2: 1cc8 CR3: 00bd74a92004 CR4: 007706e0 [ 51.104692] DR0: DR1: DR2: [ 51.104747] DR3: DR6: fffe0ff0 DR7: 0400 [ 51.104802] PKRU: 5554 [ 51.104827] Call Trace: [ 51.104851] [ 51.104874] ? __alloc_pages+0x305/0x340 [ 51.104918] kmalloc_large_node+0x45/0xa0 [ 51.104958] ? krc_this_cpu_lock+0x36/0x40 [ 51.104998] __kmalloc_node+0x2f9/0x3e0 [ 51.105036] kvmalloc_node+0x4f/0x90 [ 51.105072] expand_one_shrinker_info+0x82/0x180 [ 51.105116] prealloc_shrinker+0x174/0x1d0 [ 51.105154] alloc_super+0x2b3/0x340 [ 51.105188] ? fput+0x20/0x20 [ 51.105220] sget_fc+0x6f/0x2e0 [ 51.105249] ? kill_litter_super+0x50/0x50 [ 51.105289] ? mqueue_get_tree+0x20/0x20 [ 51.105329] get_tree_keyed+0x34/0xd0 [ 51.105362] mqueue_get_tree+0x1c/0x20 [ 51.105396] vfs_get_tree+0x27/0xc0 [ 51.105431] fc_mount+0x13/0x50 [ 51.105463] mq_init_ns+0x10a/0x1b0 [ 51.105498] copy_ipcs+0x131/0x230 [ 51.105531] create_new_namespaces+0xa6/0x2e0 [ 51.105575] unshare_nsproxy_namespaces+0x5a/0xb0 [ 51.105620] ksys_unshare+0x1a0/0x380 [ 51.106980] ? switch_task_namespaces+0x40/0x70 [ 51.108315] __x64_sys_unshare+0x12/0x20 [ 51.109648] do_syscall_64+0x59/0xc0 [ 51.111359] ? exit_to_user_mode_prepare+0x37/0xb0 [ 51.112920] ? syscall_exit_to_user_mode+0x27/0x50 [ 51.114260] ? __x64_sys_unshare+0x12/0x20 [ 51.115594] ? do_syscall_64+0x69/0xc0 [ 51.116919] ? exit_to_user_mode_prepare+0x37/0xb0 [ 51.118226] ? syscall_exit_to_user_mode+0x27/0x50 [ 51.119490] ? __x64_sys_close+0x11/0x50 [ 51.120708] ? do_syscall_64+0x69/0xc0 [ 51.121876] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 51.122937] RIP: 0033:0x7faa2a9ecfcb [ 51.123680] Code: 73 01 c3 48 8b 0d 65 1e 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 1e 0f 00 f7 d8 64 89 01 48 [ 51.125272] RSP: 002b:7fffa228d818 EFLAGS: 0246 ORIG_RAX: 0110 [ 51.126105] RAX: ffda RBX: 7fffa228d9c0 RCX: 7faa2a9ecfcb [ 51.126940] RDX: 55ea47f8e022 RSI: 0800 RDI: 0800 [ 51.127783] RBP: 7fffa228d870 R08: 55ea47f6690e R09: 7faa2a8b7b80 [ 51.128617] R10: R11: 0246 R12: 55ea47f8e022 [ 51.129444] R13: 7fffa228d9c0 R14: 0ca2 R15: 55ea47f7d4e6 [ 51.130262] [ 51.131070] Modules linked in: kmem device_dax intel_rapl_msr intel_rapl_common isst_if_common nd_pmem dax_pmem nd_btt skx_edac ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm nls_iso8859_1 irdma rapl ice intel_cstate ib_uverbs efi_pstore ib_core joydev input_leds mei_me ioatdma mei intel_pch_thermal dca acpi_ipmi ipmi_si nfit mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc ipmi_devintf scsi_dh_alua ipmi_msghandler msr ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt crct10dif_pclmul fb_sys_fops crc32_pclmul cec ghash_clmulni_intel rc_core aesni_intel crypto_simd i40e cryptd ahci
[Bug 1959215] Re: unshare test in ubuntu_stress_smoke_tests triggers "BUG: unable to handle page fault for address: 0000000000001cc8" on Impish with node vought
Test passed on 5.18.0-051800rc1-generic with both stress-ng V0.14.0 / V0.13.12 $ sudo ./stress-ng -v -t 5 --unshare 4 --unshare-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable stress-ng: debug: [27013] stress-ng 0.13.12 gf59bcb2fe1e2 stress-ng: debug: [27013] system: Linux vought 5.18.0-051800rc1-generic #202204032230 SMP PREEMPT_DYNAMIC Sun Apr 3 22:35:34 UTC 2022 x86_64 stress-ng: debug: [27013] RAM total: 351.5G, RAM free: 346.8G, swap free: 8.0G stress-ng: debug: [27013] 96 processors online, 96 processors configured stress-ng: info: [27013] setting to a 5 second run per stressor stress-ng: info: [27013] dispatching hogs: 4 unshare stress-ng: debug: [27013] cache allocate: shared cache buffer size: 36608K stress-ng: debug: [27013] starting stressors stress-ng: debug: [27013] 4 stressors started stress-ng: debug: [27014] stress-ng-unshare: started [27014] (instance 0) stress-ng: debug: [27015] stress-ng-unshare: started [27015] (instance 1) stress-ng: debug: [27016] stress-ng-unshare: started [27016] (instance 2) stress-ng: debug: [27017] stress-ng-unshare: started [27017] (instance 3) stress-ng: debug: [27015] stress-ng-unshare: exited [27015] (instance 1) stress-ng: debug: [27014] stress-ng-unshare: exited [27014] (instance 0) stress-ng: debug: [27013] process [27014] terminated stress-ng: debug: [27013] process [27015] terminated stress-ng: debug: [27016] stress-ng-unshare: exited [27016] (instance 2) stress-ng: debug: [27017] stress-ng-unshare: exited [27017] (instance 3) stress-ng: debug: [27013] process [27016] terminated stress-ng: debug: [27013] process [27017] terminated stress-ng: info: [27013] successful run completed in 5.10s stress-ng: debug: [27013] metrics-check: all stressor metrics validated and sane Now we have something to bisect with. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1959215 Title: unshare test in ubuntu_stress_smoke_tests triggers "BUG: unable to handle page fault for address: 1cc8" on Impish with node vought To manage notifications about this bug go to: https://bugs.launchpad.net/stress-ng/+bug/1959215/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1959215] Re: unshare test in ubuntu_stress_smoke_tests triggers "BUG: unable to handle page fault for address: 0000000000001cc8" on Impish with node vought
Tested with the latest mainline kernel 5.17.0-051700rc8, this issue still exist on node vought. Mar 14 10:18:25 vought stress-ng: system: 'vought' Linux 5.17.0-051700rc8-generic #202203132130 SMP PREEMPT Sun Mar 13 21:33:37 UTC 2022 x86_64 Mar 14 10:18:25 vought stress-ng: memory (MB): total 359930.86, free 354661.62, shared 2.78, buffer 244.86, swap 9215.99, free swap 9214.57 Mar 14 10:18:25 vought stress-ng: info: [223826] setting to a 5 second run per stressor Mar 14 10:18:25 vought stress-ng: info: [223826] dispatching hogs: 4 unshare Mar 14 10:18:25 vought kernel: [ 833.522791] BUG: unable to handle page fault for address: 1cc8 Mar 14 10:18:25 vought kernel: [ 833.522881] #PF: supervisor read access in kernel mode Mar 14 10:18:25 vought kernel: [ 833.522930] #PF: error_code(0x) - not-present page Mar 14 10:18:25 vought kernel: [ 833.522980] PGD 0 P4D 0 Mar 14 10:18:25 vought kernel: [ 833.523012] Oops: [#1] PREEMPT SMP NOPTI Mar 14 10:18:25 vought kernel: [ 833.523059] CPU: 31 PID: 224777 Comm: stress-ng Not tainted 5.17.0-051700rc8-generic #202203132130 Mar 14 10:18:25 vought kernel: [ 833.523143] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0D.01.0395.022720191340 02/27/2019 Mar 14 10:18:25 vought kernel: [ 833.523234] RIP: 0010:__next_zones_zonelist+0x6/0x50 Mar 14 10:18:25 vought kernel: [ 833.523304] Code: d0 0f 4e d0 3d ff 03 00 00 7f 0d 48 63 d2 5d 48 8b 04 d5 a0 04 25 95 c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 <8b> 4f 08 48 89 f8 48 89 e5 48 85 d2 75 10 eb 1d 48 63 49 50 48 0f Mar 14 10:18:25 vought kernel: [ 833.523466] RSP: 0018:c100351b3b10 EFLAGS: 00010286 Mar 14 10:18:25 vought kernel: [ 833.523518] RAX: RBX: RCX: Mar 14 10:18:25 vought kernel: [ 833.523584] RDX: a00d6a05ce00 RSI: 0002 RDI: 1cc0 Mar 14 10:18:25 vought kernel: [ 833.523649] RBP: c100351b3b78 R08: R09: Mar 14 10:18:25 vought kernel: [ 833.523714] R10: 0002 R11: fffc R12: 00052cc0 Mar 14 10:18:25 vought kernel: [ 833.523778] R13: 0002 R14: 0001 R15: 00152cc0 Mar 14 10:18:25 vought kernel: [ 833.523843] FS: 7f6d7078cb80() GS:a0394cbc() knlGS: Mar 14 10:18:25 vought kernel: [ 833.523917] CS: 0010 DS: ES: CR0: 80050033 Mar 14 10:18:25 vought kernel: [ 833.523971] CR2: 1cc8 CR3: 00bd7cd4e006 CR4: 007706e0 Mar 14 10:18:25 vought kernel: [ 833.524036] DR0: DR1: DR2: Mar 14 10:18:25 vought kernel: [ 833.524100] DR3: DR6: fffe0ff0 DR7: 0400 Mar 14 10:18:25 vought kernel: [ 833.524164] PKRU: 5554 Mar 14 10:18:25 vought kernel: [ 833.524192] Call Trace: Mar 14 10:18:25 vought kernel: [ 833.524221] Mar 14 10:18:25 vought kernel: [ 833.524250] ? __alloc_pages+0x304/0x340 Mar 14 10:18:25 vought kernel: [ 833.524308] kmalloc_large_node+0x45/0xa0 Mar 14 10:18:25 vought kernel: [ 833.524358] ? krc_this_cpu_lock+0x36/0x40 Mar 14 10:18:25 vought kernel: [ 833.524408] __kmalloc_node+0x2f8/0x3e0 Mar 14 10:18:25 vought kernel: [ 833.524452] kvmalloc_node+0x4e/0x90 Mar 14 10:18:25 vought kernel: [ 833.524494] expand_one_shrinker_info+0x82/0x180 Mar 14 10:18:25 vought kernel: [ 833.524545] prealloc_shrinker+0x172/0x1d0 Mar 14 10:18:25 vought kernel: [ 833.524588] alloc_super+0x2b3/0x340 Mar 14 10:18:25 vought kernel: [ 833.524633] ? __fput_sync+0x30/0x30 Mar 14 10:18:25 vought kernel: [ 833.524674] sget_fc+0x6f/0x2e0 Mar 14 10:18:25 vought kernel: [ 833.524706] ? kill_litter_super+0x50/0x50 Mar 14 10:18:25 vought kernel: [ 833.526339] ? mqueue_get_tree+0x20/0x20 Mar 14 10:18:25 vought kernel: [ 833.527918] get_tree_keyed+0x34/0xd0 Mar 14 10:18:25 vought kernel: [ 833.529467] mqueue_get_tree+0x1c/0x20 Mar 14 10:18:25 vought kernel: [ 833.531375] vfs_get_tree+0x27/0xc0 Mar 14 10:18:25 vought kernel: [ 833.533453] fc_mount+0x13/0x50 Mar 14 10:18:25 vought kernel: [ 833.535499] mq_init_ns+0x10a/0x1b0 Mar 14 10:18:25 vought kernel: [ 833.537491] copy_ipcs+0x130/0x230 Mar 14 10:18:25 vought kernel: [ 833.539497] create_new_namespaces+0xa6/0x2e0 Mar 14 10:18:25 vought kernel: [ 833.541416] unshare_nsproxy_namespaces+0x5a/0xb0 Mar 14 10:18:25 vought kernel: [ 833.542729] ksys_unshare+0x19f/0x380 Mar 14 10:18:25 vought kernel: [ 833.544063] __x64_sys_unshare+0x12/0x20 Mar 14 10:18:25 vought kernel: [ 833.545421] do_syscall_64+0x59/0xc0 Mar 14 10:18:25 vought kernel: [ 833.546800] ? do_syscall_64+0x69/0xc0 Mar 14 10:18:25 vought kernel: [ 833.548145] ? asm_exc_page_fault+0x8/0x30 Mar 14 10:18:25 vought kernel: [ 833.549446] entry_SYSCALL_64_after_hwframe+0x44/0xae Mar 14 10:18:25 vought kernel: [ 833.550789] RIP: 0033:0x7f6d708c1fcb Mar
[Bug 1959215] Re: unshare test in ubuntu_stress_smoke_tests triggers "BUG: unable to handle page fault for address: 0000000000001cc8" on Impish with node vought
In a duplicated report (bug 1962551), Colin mentioned that this could be a kernel issue. This issue can be found on focal OEM-5.14.0-1025.27 with Intel node vought, also Jammy 5.15.0-22-generic. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1959215 Title: unshare test in ubuntu_stress_smoke_tests triggers "BUG: unable to handle page fault for address: 1cc8" on Impish with node vought To manage notifications about this bug go to: https://bugs.launchpad.net/stress-ng/+bug/1959215/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1959215] Re: unshare test in ubuntu_stress_smoke_tests triggers "BUG: unable to handle page fault for address: 0000000000001cc8" on Impish with node vought
Here is the test log: ** Also affects: linux (Ubuntu Impish) Importance: Undecided Status: New ** Also affects: ubuntu-kernel-tests Importance: Undecided Status: New ** Description changed: Issue found on Intel node "vought" with: - * 5.13.0-28.31 - * 5.13.0-27 - * And possibly the 5.13.0-23 from the last cycle (this test didn't finish properly back then). For more earlier Impish kernels, this system was not tested with this test on them. + * 5.13.0-28.31 + * 5.13.0-27 + * And possibly the 5.13.0-23 from the last cycle (this test didn't finish properly and marked as "Incomplete" back then, just like this cycle). For more earlier Impish kernels, this system was not tested with this test on them. The test will hang with unshare test in ubuntu_stress_smoke_tests: 12:39:39 DEBUG| [stdout] udp RETURNED 0 12:39:39 DEBUG| [stdout] udp PASSED 12:39:39 DEBUG| [stdout] udp-flood STARTING 12:39:41 DEBUG| [stdout] udp-flood RETURNED 0 12:39:41 DEBUG| [stdout] udp-flood PASSED 12:39:41 DEBUG| [stdout] unshare STARTING (hangs here) And stop responding. Error can be found in dmesg: [ 2371.109961] BUG: unable to handle page fault for address: 1cc8 [ 2371.110074] #PF: supervisor read access in kernel mode [ 2371.114323] #PF: error_code(0x) - not-present page - [ 2371.119931] PGD 0 P4D 0 + [ 2371.119931] PGD 0 P4D 0 [ 2371.125257] Oops: [#1] SMP NOPTI [ 2371.129247] CPU: 51 PID: 207256 Comm: stress-ng Tainted: P O 5.13.0-27-generic #29-Ubuntu [ 2371.133203] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0D.01.0395.022720191340 02/27/2019 [ 2371.135887] RIP: 0010:__next_zones_zonelist+0x6/0x50 [ 2371.138525] Code: d0 0f 4e d0 3d ff 03 00 00 7f 0d 48 63 d2 5d 48 8b 04 d5 60 e5 35 af c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 <8b> 4f 08 48 89 f8 48 89 e5 48 85 d2 75 10 eb 1d 48 63 49 50 48 0f [ 2371.143813] RSP: 0018:a9c8b399fac0 EFLAGS: 00010282 [ 2371.146078] RAX: RBX: RCX: [ 2371.148293] RDX: 9c98e894ea98 RSI: 0002 RDI: 1cc0 [ 2371.150477] RBP: a9c8b399fb28 R08: R09: [ 2371.152650] R10: 0002 R11: d9bfbfcc5600 R12: 00052cc0 [ 2371.154778] R13: 0002 R14: 0001 R15: 00152cc0 [ 2371.156876] FS: 7fcbd141d740() GS:9cc14ccc() knlGS: [ 2371.158936] CS: 0010 DS: ES: CR0: 80050033 [ 2371.160958] CR2: 1cc8 CR3: 00059f292001 CR4: 007706e0 [ 2371.162950] DR0: DR1: DR2: [ 2371.164888] DR3: DR6: fffe0ff0 DR7: 0400 [ 2371.166811] PKRU: 5554 [ 2371.168694] Call Trace: [ 2371.170544] ? __alloc_pages+0x2f1/0x330 [ 2371.172386] kmalloc_large_node+0x45/0xb0 [ 2371.174222] __kmalloc_node+0x276/0x300 [ 2371.176036] ? queue_delayed_work_on+0x39/0x60 [ 2371.177853] kvmalloc_node+0x5a/0x90 [ 2371.179622] expand_one_shrinker_info+0x82/0x190 [ 2371.181382] prealloc_shrinker+0x175/0x1d0 [ 2371.183091] alloc_super+0x2bf/0x330 [ 2371.184764] ? __fput_sync+0x30/0x30 [ 2371.186384] sget_fc+0x74/0x2e0 [ 2371.187951] ? set_anon_super+0x50/0x50 [ 2371.189473] ? mqueue_create+0x20/0x20 [ 2371.190944] get_tree_keyed+0x34/0xd0 [ 2371.192363] mqueue_get_tree+0x1c/0x20 [ 2371.193734] vfs_get_tree+0x2a/0xc0 [ 2371.195105] fc_mount+0x13/0x50 [ 2371.196409] mq_init_ns+0x10a/0x1b0 [ 2371.197667] copy_ipcs+0x130/0x220 [ 2371.198899] create_new_namespaces+0xa6/0x2e0 [ 2371.200113] unshare_nsproxy_namespaces+0x5a/0xb0 [ 2371.201303] ksys_unshare+0x1db/0x3c0 [ 2371.202480] __x64_sys_unshare+0x12/0x20 [ 2371.203649] do_syscall_64+0x61/0xb0 [ 2371.204804] ? exit_to_user_mode_loop+0xec/0x160 [ 2371.205966] ? exit_to_user_mode_prepare+0x37/0xb0 [ 2371.207102] ? syscall_exit_to_user_mode+0x27/0x50 [ 2371.208222] ? __x64_sys_close+0x11/0x40 [ 2371.209336] ? do_syscall_64+0x6e/0xb0 [ 2371.210438] ? asm_exc_page_fault+0x8/0x30 [ 2371.211545] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 2371.212641] RIP: 0033:0x7fcbd1562c4b [ 2371.213698] Code: 73 01 c3 48 8b 0d e5 e1 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 e1 0e 00 f7 d8 64 89 01 48 [ 2371.215851] RSP: 002b:7ffc5d8eb878 EFLAGS: 0246 ORIG_RAX: 0110 [ 2371.216846] RAX: ffda RBX: 7ffc5d8eba20 RCX: 7fcbd1562c4b [ 2371.217830] RDX: 560296049862 RSI: 0800 RDI: 0800 [ 2371.218886] RBP: 7ffc5d8eb8d0 R08: 5602960234a2 R09: 7fcbd141d740 [ 2371.219908] R10: R11: 0246 R12: