On Thu, Mar 13, 2025 at 10:46:53AM +0800, Yi Zhang wrote: > Hello > > I found this issue during ndctl test suite with the latest linux tree, > please help check it and let me know if you need any test/info for it, > thanks.
Hi Yi Zhang, I was able to reproduce a 'similar' failure on both linux-next* and 6.14-rc7 with ndctl v80 by forcing the 'too small..' after alignment failure. Not sure that's a true test, but pretty sure it shouldn't lead to an OOPs. *I was curious about linux-next because it holds pending DAX changes. The reason I had to fake the failure is because my memory_block_size() of 128MB always got past the align check. I notice that it fails but doesn't OOPs on the initial cxl-test module load and attempt to config the dax region, only on the subsequent daxctl-create.sh. I'm appending my failure notes and will check back on this when I return. (Spring Break for the next 10 days :)) Also, just as I went to write this up, I see this on the list and it sounds related: https://lore.kernel.org/linux-cxl/[email protected] > snip > > [10965.520762] kmem dax3.0: mapping0: 0x3ff010000000-0x3ff02fffffff > too small after alignment > [10965.529291] kmem dax3.0: rejecting DAX region without any memory > after alignment > [10965.536704] kmem dax3.0: probe with driver kmem failed with error -22 My happy alignment with block size 128MB. [ ] kmem dax0.0: ALISON mapping0: 0x3ff010000000-0x3ff02fffffff Good after alignment [ ] kmem dax0.0: ALISON memory block size bytes:0x8000000 [ ] kmem dax0.0: ALISON fake failure here [ ] kmem dax0.0: rejecting DAX region without any memory after alignment [ ] kmem dax0.0: probe with driver kmem failed with error -22 Appending the stack trace, that is different from yours. [ 100.061790] BUG: unable to handle page fault for address: ffffeaffc0400034 [ 100.063933] #PF: supervisor write access in kernel mode [ 100.065654] #PF: error_code(0x0002) - not-present page [ 100.067357] PGD 0 P4D 0 [ 100.068485] Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI [ 100.070156] CPU: 2 UID: 0 PID: 1225 Comm: daxctl Tainted: G O 6.14.0-rc7+ #3 [ 100.072602] Tainted: [O]=OOT_MODULE [ 100.073999] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 100.076031] RIP: 0010:__init_zone_device_page+0x16/0xb0 [ 100.077245] Code: 83 eb bf cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 c1 e2 38 48 89 f0 55 48 c1 e1 3a 48 be 00 00 00 00 00 00 00 03 <c7> 47 34 01 00 00 00 48 21 f2 c7 47 30 ff ff ff ff 48 09 ca 48 89 [ 100.080886] RSP: 0018:ffffc90002afbab0 EFLAGS: 00010246 [ 100.082146] RAX: 00000003ff010000 RBX: ffffeaffc0400000 RCX: 0000000000000000 [ 100.083683] RDX: 0300000000000000 RSI: 0300000000000000 RDI: ffffeaffc0400000 [ 100.085198] RBP: ffffc90002afbb10 R08: ffff888033f7ee28 R09: ffff88807fe4ac40 [ 100.086741] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ 100.087843] R13: ffff888033f7ee28 R14: 00000003ff030000 R15: 00000003ff010000 [ 100.088946] FS: 00007f46a8a147c0(0000) GS:ffff88807d900000(0000) knlGS:0000000000000000 [ 100.090117] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 100.091079] CR2: ffffeaffc0400034 CR3: 000000005041a002 CR4: 0000000000370ef0 [ 100.092173] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 100.093251] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 100.094328] Call Trace: [ 100.094937] <TASK> [ 100.095465] ? show_regs+0x5f/0x70 [ 100.095987] ? __die+0x1f/0x70 [ 100.096469] ? page_fault_oops+0x14b/0x440 [ 100.097036] ? __init_zone_device_page+0x16/0xb0 [ 100.097680] ? search_exception_tables+0x5b/0x60 [ 100.098271] ? fixup_exception+0x22/0x300 [ 100.098886] ? kernelmode_fixup_or_oops.constprop.0+0x40/0x50 [ 100.099593] ? __bad_area_nosemaphore+0x15e/0x240 [ 100.100193] ? bad_area_nosemaphore+0x11/0x20 [ 100.100825] ? do_kern_addr_fault+0x7a/0x90 [ 100.101376] ? exc_page_fault+0x123/0x230 [ 100.101922] ? asm_exc_page_fault+0x27/0x30 [ 100.102474] ? __init_zone_device_page+0x16/0xb0 [ 100.103065] ? memmap_init_zone_device+0xc1/0x1b0 [ 100.103706] memremap_pages+0x366/0x7c0 [ 100.104222] devm_memremap_pages+0x1d/0x70 [ 100.104803] __wrap_devm_memremap_pages+0xf5/0x190 [nfit_test_iomap] [ 100.105511] dev_dax_probe+0x1cc/0x380 [device_dax] [ 100.106111] dax_bus_probe+0x6a/0xa0 [ 100.106671] really_probe+0xd7/0x390 [ 100.107165] __driver_probe_device+0xc4/0x150 [ 100.107765] driver_probe_device+0x1f/0x90 [ 100.108293] __driver_attach+0xd8/0x1d0 [ 100.108859] ? __pfx___driver_attach+0x10/0x10 [ 100.109413] bus_for_each_dev+0x65/0xb0 [ 100.109931] driver_attach+0x19/0x20 [ 100.110415] do_id_store+0xb9/0x200 [ 100.110896] ? check_preemption_disabled+0xb0/0xe0 [ 100.111466] new_id_store+0xe/0x20 [ 100.111939] drv_attr_store+0x1c/0x40 [ 100.112423] sysfs_kf_write+0x44/0x60 [ 100.112913] kernfs_fop_write_iter+0x13a/0x1f0 [ 100.113452] vfs_write+0x25c/0x500 [ 100.113921] ksys_write+0x5c/0xd0 [ 100.114371] __x64_sys_write+0x14/0x20 [ 100.114859] x64_sys_call+0x1f0d/0x1f80 [ 100.115331] do_syscall_64+0x64/0x140 [ 100.115819] entry_SYSCALL_64_after_hwframe+0x71/0x79 [ 100.116374] RIP: 0033:0x7f46a8901c37 [ 100.116830] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 100.118617] RSP: 002b:00007ffc4b4da068 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 100.119370] RAX: ffffffffffffffda RBX: 00007ffc4b4da5f8 RCX: 00007f46a8901c37 [ 100.120094] RDX: 0000000000000007 RSI: 0000000015c1b9d6 RDI: 0000000000000004 [ 100.120894] RBP: 00007ffc4b4da0a0 R08: 0000000015c1b920 R09: 0000000000000073 [ 100.121629] R10: 00007f46a8807140 R11: 0000000000000246 R12: 0000000000000000 [ 100.122340] R13: 00007ffc4b4da630 R14: 0000000000414da0 R15: 00007f46a8b2b000 [ 100.123057] </TASK> [ 100.123404] Modules linked in: cxl_test(O) cxl_mem(O) cxl_pmem(O) cxl_acpi(O) cxl_port(O) cxl_mock(O) device_dax(O) kmem nd_pmem(O) nd_btt(O) dax_pmem(O) dax_cxl nd_e820(O) nfit(O) cxl_mock_mem(O) libnvdimm(O) nfit_test_iomap(O) cxl_core(O) [last unloaded: cxl_mock(O)] [ 100.125586] CR2: ffffeaffc0400034 [ 100.126051] ---[ end trace 0000000000000000 ]--- [ 100.126647] RIP: 0010:__init_zone_device_page+0x16/0xb0 [ 100.127240] Code: 83 eb bf cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 c1 e2 38 48 89 f0 55 48 c1 e1 3a 48 be 00 00 00 00 00 00 00 03 <c7> 47 34 01 00 00 00 48 21 f2 c7 47 30 ff ff ff ff 48 09 ca 48 89 [ 100.129112] RSP: 0018:ffffc90002afbab0 EFLAGS: 00010246 [ 100.129760] RAX: 00000003ff010000 RBX: ffffeaffc0400000 RCX: 0000000000000000 [ 100.130503] RDX: 0300000000000000 RSI: 0300000000000000 RDI: ffffeaffc0400000 [ 100.131244] RBP: ffffc90002afbb10 R08: ffff888033f7ee28 R09: ffff88807fe4ac40 [ 100.132027] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ 100.132858] R13: ffff888033f7ee28 R14: 00000003ff030000 R15: 00000003ff010000 [ 100.133659] FS: 00007f46a8a147c0(0000) GS:ffff88807d900000(0000) knlGS:0000000000000000 [ 100.134478] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 100.135136] CR2: ffffeaffc0400034 CR3: 000000005041a002 CR4: 0000000000370ef0 [ 100.135927] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 100.136747] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 100.137505] note: daxctl[1225] exited with irqs disabled > [10966.419131] BUG: unable to handle page fault for address: ffffeaffc0400000 > [10966.426011] #PF: supervisor write access in kernel mode > [10966.431234] #PF: error_code(0x0002) - not-present page > [10966.436374] PGD 0 P4D 0 > [10966.438913] Oops: Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI > [10966.444315] CPU: 1 UID: 0 PID: 21254 Comm: daxctl Tainted: G > O 6.14.0-rc6+ #2 > [10966.452832] Tainted: [O]=OOT_MODULE > [10966.456323] Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS > 2.22.2 09/12/2024 > [10966.463891] RIP: 0010:memset_orig+0x33/0xb0 > [10966.468084] Code: b6 ce 48 b8 01 01 01 01 01 01 01 01 48 0f af c1 > 41 89 f9 41 83 e1 07 75 70 48 89 d1 48 c1 e9 06 74 35 0f 1f 44 00 00 > 48 ff c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 > 89 47 > [10966.486831] RSP: 0018:ffffc90005ddf3a8 EFLAGS: 00010216 > [10966.492055] RAX: ffffffffffffffff RBX: 0000000000200000 RCX: > 0000000000007fff > [10966.499189] RDX: 0000000000200000 RSI: 00000000ffffffff RDI: > ffffeaffc0400000 > [10966.506320] RBP: 0000000000000000 R08: 0000000000000001 R09: > 0000000000000000 > [10966.513453] R10: ffffeaffc0400000 R11: 0000000000000000 R12: > 000000000007fe02 > [10966.520584] R13: 0000000000000ffc R14: 0000000000007fe0 R15: > 00000003ff010000 > [10966.527717] FS: 00007f59922b97c0(0000) GS:ffff88901f400000(0000) > knlGS:0000000000000000 > [10966.535802] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [10966.541551] CR2: ffffeaffc0400000 CR3: 000000032ed88006 CR4: > 00000000007726f0 > [10966.548680] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [10966.555813] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [10966.562947] PKRU: 55555554 > [10966.565658] Call Trace: > [10966.568112] <TASK> > [10966.570217] ? show_trace_log_lvl+0x1b0/0x2f0 > [10966.574577] ? show_trace_log_lvl+0x1b0/0x2f0 > [10966.578938] ? sparse_add_section+0x2e6/0x740 > [10966.583297] ? __die_body.cold+0x8/0x12 > [10966.587136] ? page_fault_oops+0x15e/0x1e0 > [10966.591235] ? __pfx_page_fault_oops+0x10/0x10 > [10966.595678] ? search_bpf_extables+0x168/0x260 > [10966.600128] ? exc_page_fault+0x10c/0x120 > [10966.604138] ? asm_exc_page_fault+0x26/0x30 > [10966.608327] ? memset_orig+0x33/0xb0 > [10966.611903] sparse_add_section+0x2e6/0x740 > [10966.616091] ? __pfx_sparse_add_section+0x10/0x10 > [10966.620797] __add_pages+0x1ca/0x290 > [10966.624374] add_pages+0x52/0x1c0 > [10966.627693] pagemap_range+0x4ec/0x1070 > [10966.631534] ? __pfx_dev_pagemap_percpu_release+0x10/0x10 > [10966.636932] ? percpu_ref_init+0x12c/0x330 > [10966.641031] memremap_pages+0x2eb/0x700 > [10966.644870] ? __pfx_memremap_pages+0x10/0x10 > [10966.649232] ? __pfx_get_nfit_res+0x10/0x10 [nfit_test_iomap] > [10966.654984] ? trace_irq_enable.constprop.0+0x151/0x1c0 > [10966.660210] devm_memremap_pages+0x45/0x90 > [10966.664309] dev_dax_probe+0x296/0xa90 [device_dax] > [10966.669188] ? __pfx___up_read+0x10/0x10 > [10966.673117] dax_bus_probe+0x106/0x1e0 > [10966.676874] ? driver_sysfs_add+0xfc/0x290 > [10966.680976] really_probe+0x1e0/0x8a0 > [10966.684641] __driver_probe_device+0x18c/0x370 > [10966.689086] driver_probe_device+0x4a/0x120 > [10966.693273] __driver_attach+0x194/0x4a0 > [10966.697199] ? __pfx___driver_attach+0x10/0x10 > [10966.701642] bus_for_each_dev+0x106/0x190 > [10966.705657] ? __pfx_bus_for_each_dev+0x10/0x10 > [10966.710191] do_id_store+0x14c/0x4c0 > [10966.713769] ? __pfx_do_id_store+0x10/0x10 > [10966.717870] ? __pfx_sysfs_kf_write+0x10/0x10 > [10966.722235] kernfs_fop_write_iter+0x39f/0x5a0 > [10966.726683] vfs_write+0x5fa/0xe90 > [10966.730087] ? __pfx_vfs_write+0x10/0x10 > [10966.734016] ? rcu_is_watching+0x15/0xb0 > [10966.737947] ksys_write+0xfa/0x1d0 > [10966.741352] ? __pfx_ksys_write+0x10/0x10 > [10966.745368] do_syscall_64+0x92/0x180 > [10966.749032] ? __x64_sys_openat+0x109/0x1d0 > [10966.753216] ? __pfx___x64_sys_openat+0x10/0x10 > [10966.757749] ? rcu_is_watching+0x15/0xb0 > [10966.761675] ? trace_irq_enable.constprop.0+0x151/0x1c0 > [10966.766902] ? syscall_exit_to_user_mode+0x82/0x250 > [10966.771781] ? do_syscall_64+0x9e/0x180 > [10966.775619] ? __x64_sys_getdents64+0x157/0x240 > [10966.780151] ? __pfx___x64_sys_getdents64+0x10/0x10 > [10966.785030] ? rcu_is_watching+0x15/0xb0 > [10966.788958] ? __pfx_filldir64+0x10/0x10 > [10966.792884] ? rcu_is_watching+0x15/0xb0 > [10966.796808] ? trace_irq_enable.constprop.0+0x151/0x1c0 > [10966.802035] ? syscall_exit_to_user_mode+0x82/0x250 > [10966.806913] ? do_syscall_64+0x9e/0x180 > [10966.810752] ? syscall_exit_to_user_mode+0x82/0x250 > [10966.815633] ? do_syscall_64+0x9e/0x180 > [10966.819472] ? __pfx___call_rcu_common.constprop.0+0x10/0x10 > [10966.825133] ? rcu_is_watching+0x15/0xb0 > [10966.829055] ? trace_irq_enable.constprop.0+0x151/0x1c0 > [10966.834282] ? syscall_exit_to_user_mode+0x82/0x250 > [10966.839163] ? clear_bhb_loop+0x25/0x80 > [10966.842999] ? clear_bhb_loop+0x25/0x80 > [10966.846840] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [10966.851893] RIP: 0033:0x7f5992562e14 > [10966.855489] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f > 84 00 00 00 00 00 f3 0f 1e fa 80 3d 95 d2 0d 00 00 74 13 b8 01 00 00 > 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 > 18 48 > [10966.874233] RSP: 002b:00007ffee3bc0118 EFLAGS: 00000202 ORIG_RAX: > 0000000000000001 > [10966.881800] RAX: ffffffffffffffda RBX: 00007ffee3bc06a8 RCX: > 00007f5992562e14 > [10966.888931] RDX: 0000000000000007 RSI: 00000000042fcd86 RDI: > 0000000000000004 > [10966.896064] RBP: 00007ffee3bc0150 R08: 0000000000000073 R09: > 00000000ffffffff > [10966.903198] R10: 00007f59926b4370 R11: 0000000000000202 R12: > 0000000000000000 > [10966.910330] R13: 00007ffee3bc06e0 R14: 00007f59926ef000 R15: > 0000000000414d78 > [10966.917467] </TASK> > [10966.919653] Modules linked in: cxl_test(O) cxl_mem(O) cxl_pmem(O) > cxl_acpi(O) cxl_port(O) dax_pmem(O) nd_pmem(O) device_dax(O) dax_cxl > cxl_mock_mem(O) cxl_mock(O) cxl_core(O) einj ext4 mbcache jbd2 kmem > rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency > intel_uncore_frequency_common skx_edac skx_edac_common > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iTCO_wdt > dell_pc iTCO_vendor_support rapl ipmi_ssif dell_smbios > platform_profile vfat intel_cstate fat dcdbas intel_uncore > dell_wmi_descriptor wmi_bmof pcspkr mgag200 tg3 mei_me i2c_i801 > i2c_algo_bit mei i2c_smbus lpc_ich intel_pch_thermal acpi_power_meter > ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler sg loop fuse nfnetlink > xfs sd_mod nd_btt(O) ghash_clmulni_intel nd_e820(O) megaraid_sas ahci > libahci libata wmi nfit(O) libnvdimm(O) nfit_test_iomap(O) dm_mirror > dm_region_hash dm_log dm_mod > [10966.919776] Unloaded tainted modules: cxl_port(O):23 cxl_mem(O):23 > cxl_pmem(O):23 cxl_acpi(O):23 cxl_test(O):23 dax_pmem(O):37 > device_dax(O):37 nd_pmem(O):37 nfit_test(O):39 [last unloaded: > cxl_port(O)] > [10967.014742] CR2: ffffeaffc0400000 > [10967.018062] ---[ end trace 0000000000000000 ]--- > [10967.080660] RIP: 0010:memset_orig+0x33/0xb0 > [10967.084863] Code: b6 ce 48 b8 01 01 01 01 01 01 01 01 48 0f af c1 > 41 89 f9 41 83 e1 07 75 70 48 89 d1 48 c1 e9 06 74 35 0f 1f 44 00 00 > 48 ff c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 > 89 47 > [10967.103608] RSP: 0018:ffffc90005ddf3a8 EFLAGS: 00010216 > [10967.108832] RAX: ffffffffffffffff RBX: 0000000000200000 RCX: > 0000000000007fff > [10967.115965] RDX: 0000000000200000 RSI: 00000000ffffffff RDI: > ffffeaffc0400000 > [10967.123098] RBP: 0000000000000000 R08: 0000000000000001 R09: > 0000000000000000 > [10967.130230] R10: ffffeaffc0400000 R11: 0000000000000000 R12: > 000000000007fe02 > [10967.137362] R13: 0000000000000ffc R14: 0000000000007fe0 R15: > 00000003ff010000 > [10967.144494] FS: 00007f59922b97c0(0000) GS:ffff88901f400000(0000) > knlGS:0000000000000000 > [10967.152579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [10967.158327] CR2: ffffeaffc0400000 CR3: 000000032ed88006 CR4: > 00000000007726f0 > [10967.165460] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [10967.172593] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [10967.179724] PKRU: 55555554 > [10967.182439] Kernel panic - not syncing: Fatal exception > [10967.187682] Kernel Offset: 0x20800000 from 0xffffffff81000000 > (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > [10967.236024] pstore: backend (erst) writing error (-28) > [10967.241171] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > -- > Best Regards, > Yi Zhang >
