This way more complex than I thought and its not so easy to address. Lets see if I can summarize the issue here. Whenever developing the regressions tests for ndctl, it occurred to me the same backtrace, over and over, when realizing the tests:
---- [ 271.705646] memory add fail, invalid altmap [ 271.705677] WARNING: CPU: 5 PID: 886 at arch/x86/mm/init_64.c:852 add_pages+0x5d/0x70 [ 271.705679] Modules linked in: nls_iso8859_1 edac_mce_amd dax_pmem_compat nd_pmem device_dax nd_btt dax_pmem_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev aesni_intel aes_x86_64 crypto_simd input_leds cryptd glue_helper serio_raw mac_hid qemu_fw_cfg nfit sch_fq_codel ip_tables x_tables autofs4 virtio_net psmouse net_failover virtio_blk i2c_piix4 failover pata_acpi floppy [ 271.705707] CPU: 5 PID: 886 Comm: ndctl Not tainted 5.3.0-24-generic #26-Ubuntu [ 271.705709] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 271.705720] RIP: 0010:add_pages+0x5d/0x70 [ 271.705721] Code: 33 c2 01 76 20 48 89 15 99 33 c2 01 48 89 15 a2 33 c2 01 48 c1 e2 0c 48 03 15 97 96 39 01 48 89 15 48 0e c2 01 5b 41 5c 5d c3 <0f> 0b eb ba 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 [ 271.705722] RSP: 0018:ffffba02c0d2bbf0 EFLAGS: 00010282 [ 271.705723] RAX: 00000000ffffffea RBX: 000000000017ffc0 RCX: 0000000000000000 [ 271.705723] RDX: 0000000000000000 RSI: ffff9aaa3da97448 RDI: ffff9aaa3da97448 [ 271.705724] RBP: ffffba02c0d2bc00 R08: ffff9aaa3da97448 R09: 0000000000000004 [ 271.705724] R10: 0000000000000000 R11: 0000000000000001 R12: 000000000003fe40 [ 271.705725] R13: 0000000000000001 R14: ffffba02c0d2bc48 R15: ffff9aa975efaaf8 [ 271.705727] FS: 00007f70a62d4bc0(0000) GS:ffff9aaa3da80000(0000) knlGS:0000000000000000 [ 271.705728] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 271.705729] CR2: 00005594a0aaa158 CR3: 0000000138110000 CR4: 00000000000406e0 [ 271.705731] Call Trace: [ 271.705734] arch_add_memory+0x41/0x50 [ 271.705737] devm_memremap_pages+0x47c/0x640 [ 271.705740] pmem_attach_disk+0x173/0x610 [nd_pmem] [ 271.705741] ? devm_memremap+0x67/0xa0 [ 271.705743] nd_pmem_probe+0x7f/0xa0 [nd_pmem] [ 271.705745] nvdimm_bus_probe+0x6b/0x170 [ 271.705747] really_probe+0xfb/0x3a0 [ 271.705749] driver_probe_device+0x5f/0xe0 [ 271.705750] device_driver_attach+0x5d/0x70 [ 271.705751] bind_store+0xd3/0x110 [ 271.705753] drv_attr_store+0x24/0x30 [ 271.705754] sysfs_kf_write+0x3e/0x50 [ 271.705755] kernfs_fop_write+0x11e/0x1a0 [ 271.705757] __vfs_write+0x1b/0x40 [ 271.705758] vfs_write+0xb9/0x1a0 [ 271.705759] ksys_write+0x67/0xe0 [ 271.705760] __x64_sys_write+0x1a/0x20 [ 271.705762] do_syscall_64+0x5a/0x130 [ 271.705764] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 271.705765] RIP: 0033:0x7f70a6189327 [ 271.705767] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 271.705767] RSP: 002b:00007ffc616998b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 271.705768] RAX: ffffffffffffffda RBX: 00007f70a62d4ae8 RCX: 00007f70a6189327 [ 271.705769] RDX: 0000000000000007 RSI: 00005594a0aa01a0 RDI: 0000000000000006 [ 271.705769] RBP: 0000000000000006 R08: 0000000000000006 R09: 7375622f7379732f [ 271.705770] R10: 0000000000000000 R11: 0000000000000246 R12: 00005594a0aa01a0 [ 271.705770] R13: 0000000000000001 R14: 0000000000000007 R15: 00007ffc61699908 [ 271.705772] ---[ end trace 7ee621e68332018c ]--- ---- And I realized that I could NOT re-generate the SECOND namespace (the first one always worked). First I had to read about how qemu emulated nvdimms and check why namespaces were not persistent on qemu nvdimms emulation, then I had to discover why it looked like virtual nvdimms had no labels (as RAW namespaces are always created by default) and then I had to understand why the mapping was failing, to realize the real issue. First things first. ### QEMU emulated nvdimms: https://github.com/qemu/qemu/blob/master/docs/nvdimm.txt Whenever backing filesystems are not DAX capable (on a REAL NVDIMM HW, for example) then after the instance is shutdown all nvdimm data (written to the backing files) are gone. ### QEMU virtual nvdimms lack of labels: Label ----- QEMU v2.7.0 and later implement the label support for vNVDIMM devices. To enable label on vNVDIMM devices, users can simply add "label-size=$SZ" option to "-device nvdimm", e.g. -device nvdimm,id=nvdimm1,memdev=mem1,label-size=128K Note: 1. The minimal label size is 128KB. 2. QEMU v2.7.0 and later store labels at the end of backend storage. If a memory backend file, which was previously used as the backend of a vNVDIMM device without labels, is now used for a vNVDIMM device with label, the data in the label area at the end of file will be inaccessible to the guest. If any useful data (e.g. the meta-data of the file system) was stored there, the latter usage may result guest data corruption (e.g. breakage of guest file system). ### namespace1.0 always failing (with given back trace) This is related to: https://github.com/pmem/ndctl/issues/76 Specifically this comment: https://github.com/pmem/ndctl/issues/76#issuecomment-440840503 """ Linux needs 128MB alignment for each adjacent namespace. There isn't a fix because BIOS has no visibility or responsibility for Linux alignment constraints. Going forward Linux will eventually gain the capability to support fsdax mode with namespaces that collide within a section (128MB) until then the only workarounds are "raw" mode (not useful), or requiring fsdax namespaces to be created with "--align=1GB". We faced something similar with section collisions with System RAM, but in that case we could interrogate the collision ahead of time. As it stands we don't find out about this collision until its too late. I'll try to think of something more clever, but the solution may devolve to just teaching the tooling to require large alignments. """ As we can see here: rafaeldtinoco@ndctltest:~$ sudo cat /proc/iomem ... 100000000-13fffffff : System RAM 140000000-17ffbffff : Persistent Memory 140000000-17ffbffff : namespace0.0 17ffc0000-1bff7ffff : Persistent Memory 17ffc0000-1bff7ffff : namespace1.0 340000000-3bfffffff : PCI Bus 0000:00 When using 2 nvdimms in QEMU, both regions (thus namespaces) share boundaries and there is a special (to 128MB) alignment need for it. You can make a RAW namespace to work, but no other: ---- rafaeldtinoco@ndctltest:~$ sudo ndctl disable-region all disabled 2 regions rafaeldtinoco@ndctltest:~$ sudo ndctl zero-labels all zeroed 2 nmems rafaeldtinoco@ndctltest:~$ sudo ndctl enable-region all enabled 2 regions rafaeldtinoco@ndctltest:~$ sudo ndctl list -N rafaeldtinoco@ndctltest:~$ sudo ndctl create-namespace -r region0 -m raw { "dev":"namespace0.0", "mode":"raw", "size":"1023.75 MiB (1073.48 MB)", "uuid":"54921448-1043-4779-bd77-bb77f70b11eb", "sector_size":512, "blockdev":"pmem0" } rafaeldtinoco@ndctltest:~$ sudo ndctl create-namespace -r region1 -m raw { "dev":"namespace1.0", "mode":"raw", "size":"1023.75 MiB (1073.48 MB)", "uuid":"c5d32b36-c4b4-4c37-a401-0209e2b2e58a", "sector_size":512, "blockdev":"pmem1" } --- but if I try other namespace mode: --- rafaeldtinoco@ndctltest:~$ sudo ndctl disable-region all disabled 2 regions rafaeldtinoco@ndctltest:~$ sudo ndctl zero-labels all zeroed 2 nmems rafaeldtinoco@ndctltest:~$ sudo ndctl enable-region all enabled 2 regions rafaeldtinoco@ndctltest:~$ sudo ndctl list -N rafaeldtinoco@ndctltest:~$ sudo ndctl create-namespace -r region0 -m fsdax { "dev":"namespace0.0", "mode":"fsdax", "map":"dev", "size":"1004.00 MiB (1052.77 MB)", "uuid":"5c8e1059-2714-4e9a-b47f-33bb617d4489", "sector_size":512, "align":2097152, "blockdev":"pmem0" } rafaeldtinoco@ndctltest:~$ sudo ndctl create-namespace -r region1 -m fsdax libndctl: ndctl_pfn_enable: pfn1.0: failed to enable Error: namespace1.0: failed to enable failed to create namespace: No such device or address ---- I face the boundaries problem. Seabios can be fixed by: https://github.com/pmem/ndctl/issues/76#issuecomment-440848371 making sure alignment is correct. As the kernel is already taking care of the issue: https://github.com/0day- ci/linux/commit/e50ad2650daecc1135bb28befd278fa291b6afe9 it looks like QEMU in this case would have to address this alignment. For now, ndctl tests being made for: https://bugs.launchpad.net/ubuntu/+source/ndctl/+bug/1853506 will have to deal with a single virtual nvdimm. ** Bug watch added: github.com/pmem/ndctl/issues #76 https://github.com/pmem/ndctl/issues/76 ** Summary changed: - qemu nvdimm virtualization + linux 5.3.0-24-generic kernel PROBE ERROR + QEMU emulated nvdimm regions alignment need (128MB) or ndctl create-namespace namespace1.0 might fail ** Changed in: linux (Ubuntu) Status: Confirmed => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1855177 Title: QEMU emulated nvdimm regions alignment need (128MB) or ndctl create- namespace namespace1.0 might fail Status in linux package in Ubuntu: Fix Released Status in ndctl package in Ubuntu: Confirmed Status in qemu package in Ubuntu: Confirmed Status in linux source package in Focal: Fix Released Status in ndctl source package in Focal: Confirmed Status in qemu source package in Focal: Confirmed Bug description: I got a probe error for pfn1.0 (from both pfn0.0 and pfn1.0) when dealing with ndctl: ---- [11257.765457] memory add fail, invalid altmap [11257.765489] WARNING: CPU: 6 PID: 5680 at arch/x86/mm/init_64.c:852 add_pages+0x5d/0x70 [11257.765489] Modules linked in: nls_iso8859_1 edac_mce_amd crct10dif_pclmul crc32_pclmul dax_pmem_compat device_dax dax_pmem_core nd_pmem nd_btt ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper input_leds joydev mac_hid nfit serio_raw qemu_fw_cfg sch_fq_codel ip_tables x_tables autofs4 virtio_net net_failover psmouse failover pata_acpi virtio_blk i2c_piix4 floppy [11257.765505] CPU: 6 PID: 5680 Comm: ndctl Not tainted 5.3.0-24-generic #26-Ubuntu [11257.765505] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [11257.765507] RIP: 0010:add_pages+0x5d/0x70 [11257.765509] Code: 33 c2 01 76 20 48 89 15 99 33 c2 01 48 89 15 a2 33 c2 01 48 c1 e2 0c 48 03 15 97 96 39 01 48 89 15 48 0e c2 01 5b 41 5c 5d c3 <0f> 0b eb ba 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 [11257.765509] RSP: 0018:ffffa360c09dfbf0 EFLAGS: 00010282 [11257.765510] RAX: 00000000ffffffea RBX: 000000000017ffe0 RCX: 0000000000000000 [11257.765511] RDX: 0000000000000000 RSI: ffff8acb7db17448 RDI: ffff8acb7db17448 [11257.765512] RBP: ffffa360c09dfc00 R08: ffff8acb7db17448 R09: 0000000000000004 [11257.765512] R10: 0000000000000000 R11: 0000000000000001 R12: 000000000003fe20 [11257.765513] R13: 0000000000000001 R14: ffffa360c09dfc48 R15: ffff8acb7a7226f8 [11257.765515] FS: 00007febc9fd6bc0(0000) GS:ffff8acb7db00000(0000) knlGS:0000000000000000 [11257.765516] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11257.765517] CR2: 000055eec8aab398 CR3: 000000013a8fa000 CR4: 00000000000406e0 [11257.765519] Call Trace: [11257.765523] arch_add_memory+0x41/0x50 [11257.765525] devm_memremap_pages+0x47c/0x640 [11257.765529] pmem_attach_disk+0x173/0x610 [nd_pmem] [11257.765531] ? devm_memremap+0x67/0xa0 [11257.765532] nd_pmem_probe+0x7f/0xa0 [nd_pmem] [11257.765542] nvdimm_bus_probe+0x6b/0x170 [11257.765547] really_probe+0xfb/0x3a0 [11257.765549] driver_probe_device+0x5f/0xe0 [11257.765550] device_driver_attach+0x5d/0x70 [11257.765551] bind_store+0xd3/0x110 [11257.765553] drv_attr_store+0x24/0x30 [11257.765554] sysfs_kf_write+0x3e/0x50 [11257.765555] kernfs_fop_write+0x11e/0x1a0 [11257.765557] __vfs_write+0x1b/0x40 [11257.765558] vfs_write+0xb9/0x1a0 [11257.765559] ksys_write+0x67/0xe0 [11257.765561] __x64_sys_write+0x1a/0x20 [11257.765567] do_syscall_64+0x5a/0x130 [11257.765693] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [11257.765696] RIP: 0033:0x7febc9e81327 [11257.765698] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [11257.765698] RSP: 002b:00007ffd599433f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [11257.765699] RAX: ffffffffffffffda RBX: 00007febc9fd6ae8 RCX: 00007febc9e81327 [11257.765700] RDX: 0000000000000007 RSI: 000055eec8a9bfa0 RDI: 0000000000000004 [11257.765701] RBP: 0000000000000004 R08: 0000000000000006 R09: 7375622f7379732f [11257.765701] R10: 0000000000000000 R11: 0000000000000246 R12: 000055eec8a9bfa0 [11257.765702] R13: 0000000000000001 R14: 0000000000000007 R15: 00007ffd59943448 [11257.765703] ---[ end trace 442db04e33790cb5 ]--- [11257.782659] nd_pmem: probe of pfn1.0 failed with error -22 ---- It seems that after this point I can't play with my second virtual nvdimm device (pfn1.0). A namespace destroy works but a namespace creation does not: rafaeldtinoco@ndctltest:~$ sudo ndctl list -B [ { "provider":"ACPI.NFIT", "dev":"ndbus0" } ] rafaeldtinoco@ndctltest:~$ sudo ndctl list -D [ { "dev":"nmem1", "id":"8680-57341200", "handle":2, "phys_id":0 }, { "dev":"nmem0", "id":"8680-56341200", "handle":1, "phys_id":0 } ] rafaeldtinoco@ndctltest:~$ sudo ndctl list -R [ { "dev":"region1", "size":1073610752, "available_size":1073610752, "max_available_extent":1073610752, "type":"pmem", "iset_id":52512795602891997, "persistence_domain":"unknown" }, { "dev":"region0", "size":1073610752, "available_size":0, "max_available_extent":0, "type":"pmem", "iset_id":52512752653219036, "persistence_domain":"unknown" } ] Now, whenever trying to access namespace1.0 (from region1/nmem1/ndbus) I get: [11257.782659] nd_pmem: probe of pfn1.0 failed with error -22 [11332.001388] pfn0.0 initialised, 257024 pages in 8ms [11332.001818] pmem0: detected capacity change from 0 to 1052770304 [11359.739280] pfn0.1 initialised, 257024 pages in 0ms [11362.643212] pfn0.0 initialised, 257024 pages in 0ms [11362.644225] pmem0: detected capacity change from 0 to 1052770304 [11406.230365] pfn0.1 initialised, 257024 pages in 0ms [11406.231281] pmem0: detected capacity change from 0 to 1052770304 [11517.785147] pfn0.0 initialised, 257024 pages in 4ms [11517.785593] pmem0: detected capacity change from 0 to 1052770304 [11537.431697] pfn0.1 initialised, 257024 pages in 0ms [11537.432256] pmem0: detected capacity change from 0 to 1052770304 [11627.965947] pfn0.0 initialised, 257024 pages in 0ms [11627.966415] pmem0: detected capacity change from 0 to 1052770304 [11653.277667] pfn0.1 initialised, 257024 pages in 4ms [11653.278086] pmem0: detected capacity change from 0 to 1052770304 [11708.696361] pfn0.0 initialised, 257024 pages in 0ms [11708.697617] pmem0: detected capacity change from 0 to 1052770304 [11753.621295] nd_pmem btt0.0: No existing arenas [11753.623118] pmem0s: detected capacity change from 0 to 1071484928 [11767.087424] pfn0.1 initialised, 257024 pages in 4ms [11767.088272] pmem0: detected capacity change from 0 to 1052770304 [11775.815396] dax0.0 initialised, 257024 pages in 4ms [12848.341346] pfn0.0 initialised, 257024 pages in 0ms [12848.341785] pmem0: detected capacity change from 0 to 1052770304 [12851.897716] nd_pmem: probe of pfn1.0 failed with error -22 [13023.693246] pfn0.1 initialised, 257024 pages in 0ms [13023.693662] pmem0: detected capacity change from 0 to 1052770304 [13026.517467] nd_pmem: probe of pfn1.0 failed with error -22 [13067.380701] pmem0: detected capacity change from 0 to 1073610752 [13117.568499] nd_pmem: probe of pfn1.0 failed with error -22 [13946.604199] pfn0.0 initialised, 257024 pages in 0ms [13946.604777] pmem0: detected capacity change from 0 to 1052770304 [13957.948381] nd_pmem: probe of pfn1.0 failed with error -22 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855177/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp