Alright, if someone ever faces this, the label size can be the one to
blame.

Judging by the manual:

https://github.com/qemu/qemu/blob/master/docs/nvdimm.txt

Note:

1. The minimal label size is 128KB.

2. QEMU v2.7.0 and later store labels at the end of backend storage.
   If a memory backend file, which was previously used as the backend
   of a vNVDIMM device without labels, is now used for a vNVDIMM
   device with label, the data in the label area at the end of file
   will be inaccessible to the guest. If any useful data (e.g. the
   meta-data of the file system) was stored there, the latter usage
   may result guest data corruption (e.g. breakage of guest file
   system).

=> 128KB was not enough. Changing label area to 2MB "fixed" the issue.
Funny is that I'm not even trying to use labels, I'm using full regions
for namespaces BUT its likely that there is a single label in those
cases (being written at the end of backing files).

=> I was also truncating the backing files, now I'm creating full zeroed
files (I guess that for the MMIO nature of DAX & PMEM, having full files
is either better OR mandatory).

rafaeldtinoco@ndctltest:~$ sudo ndctl disable-namespace all
disabled 2 namespaces

rafaeldtinoco@ndctltest:~$ sudo ndctl create-namespace -v -r region1 -m fsdax 
{
  "dev":"namespace1.0",
  "mode":"fsdax",
  "map":"dev",
  "size":"1006.00 MiB (1054.87 MB)",
  "uuid":"51dec4e0-1414-418a-9263-6459d5c12194",
  "sector_size":512,
  "align":2097152,
  "blockdev":"pmem1"
}

rafaeldtinoco@ndctltest:~$ sudo ndctl create-namespace -v -r region0 -m fsdax 
{
  "dev":"namespace0.0",
  "mode":"fsdax",
  "map":"dev",
  "size":"1006.00 MiB (1054.87 MB)",
  "uuid":"33be9543-603c-4a7f-9f2d-98a3f3ff5ec0",
  "sector_size":512,
  "align":2097152,
  "blockdev":"pmem0"
}


** Changed in: linux (Ubuntu Focal)
       Status: Confirmed => Invalid

** Changed in: qemu (Ubuntu Focal)
       Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855177

Title:
  qemu nvdimm virtualization + linux 5.3.0-24-generic kernel PROBE ERROR

Status in linux package in Ubuntu:
  Invalid
Status in qemu package in Ubuntu:
  Invalid
Status in linux source package in Focal:
  Invalid
Status in qemu source package in Focal:
  Invalid

Bug description:
  I got a probe error for pfn1.0 (from both pfn0.0 and pfn1.0) when
  dealing with ndctl:

  ----
  [11257.765457] memory add fail, invalid altmap
  [11257.765489] WARNING: CPU: 6 PID: 5680 at arch/x86/mm/init_64.c:852 
add_pages+0x5d/0x70
  [11257.765489] Modules linked in: nls_iso8859_1 edac_mce_amd crct10dif_pclmul 
crc32_pclmul dax_pmem_compat device_dax dax_pmem_core nd_pmem nd_btt 
ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper 
input_leds joydev mac_hid nfit serio_raw qemu_fw_cfg sch_fq_codel ip_tables 
x_tables autofs4 virtio_net net_failover psmouse failover pata_acpi virtio_blk 
i2c_piix4 floppy
  [11257.765505] CPU: 6 PID: 5680 Comm: ndctl Not tainted 5.3.0-24-generic 
#26-Ubuntu
  [11257.765505] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.12.0-1 04/01/2014
  [11257.765507] RIP: 0010:add_pages+0x5d/0x70
  [11257.765509] Code: 33 c2 01 76 20 48 89 15 99 33 c2 01 48 89 15 a2 33 c2 01 
48 c1 e2 0c 48 03 15 97 96 39 01 48 89 15 48 0e c2 01 5b 41 5c 5d c3 <0f> 0b eb 
ba 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44
  [11257.765509] RSP: 0018:ffffa360c09dfbf0 EFLAGS: 00010282
  [11257.765510] RAX: 00000000ffffffea RBX: 000000000017ffe0 RCX: 
0000000000000000
  [11257.765511] RDX: 0000000000000000 RSI: ffff8acb7db17448 RDI: 
ffff8acb7db17448
  [11257.765512] RBP: ffffa360c09dfc00 R08: ffff8acb7db17448 R09: 
0000000000000004
  [11257.765512] R10: 0000000000000000 R11: 0000000000000001 R12: 
000000000003fe20
  [11257.765513] R13: 0000000000000001 R14: ffffa360c09dfc48 R15: 
ffff8acb7a7226f8
  [11257.765515] FS:  00007febc9fd6bc0(0000) GS:ffff8acb7db00000(0000) 
knlGS:0000000000000000
  [11257.765516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [11257.765517] CR2: 000055eec8aab398 CR3: 000000013a8fa000 CR4: 
00000000000406e0
  [11257.765519] Call Trace:
  [11257.765523]  arch_add_memory+0x41/0x50
  [11257.765525]  devm_memremap_pages+0x47c/0x640
  [11257.765529]  pmem_attach_disk+0x173/0x610 [nd_pmem]
  [11257.765531]  ? devm_memremap+0x67/0xa0
  [11257.765532]  nd_pmem_probe+0x7f/0xa0 [nd_pmem]
  [11257.765542]  nvdimm_bus_probe+0x6b/0x170
  [11257.765547]  really_probe+0xfb/0x3a0
  [11257.765549]  driver_probe_device+0x5f/0xe0
  [11257.765550]  device_driver_attach+0x5d/0x70
  [11257.765551]  bind_store+0xd3/0x110
  [11257.765553]  drv_attr_store+0x24/0x30
  [11257.765554]  sysfs_kf_write+0x3e/0x50
  [11257.765555]  kernfs_fop_write+0x11e/0x1a0
  [11257.765557]  __vfs_write+0x1b/0x40
  [11257.765558]  vfs_write+0xb9/0x1a0
  [11257.765559]  ksys_write+0x67/0xe0
  [11257.765561]  __x64_sys_write+0x1a/0x20
  [11257.765567]  do_syscall_64+0x5a/0x130
  [11257.765693]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [11257.765696] RIP: 0033:0x7febc9e81327
  [11257.765698] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 
f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
  [11257.765698] RSP: 002b:00007ffd599433f8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
  [11257.765699] RAX: ffffffffffffffda RBX: 00007febc9fd6ae8 RCX: 
00007febc9e81327
  [11257.765700] RDX: 0000000000000007 RSI: 000055eec8a9bfa0 RDI: 
0000000000000004
  [11257.765701] RBP: 0000000000000004 R08: 0000000000000006 R09: 
7375622f7379732f
  [11257.765701] R10: 0000000000000000 R11: 0000000000000246 R12: 
000055eec8a9bfa0
  [11257.765702] R13: 0000000000000001 R14: 0000000000000007 R15: 
00007ffd59943448
  [11257.765703] ---[ end trace 442db04e33790cb5 ]---
  [11257.782659] nd_pmem: probe of pfn1.0 failed with error -22
  ----

  It seems that after this point I can't play with my second virtual
  nvdimm device (pfn1.0).

  A namespace destroy works but a namespace creation does not:

  rafaeldtinoco@ndctltest:~$ sudo ndctl list -B
  [
    {
      "provider":"ACPI.NFIT",
      "dev":"ndbus0"
    }
  ]

  rafaeldtinoco@ndctltest:~$ sudo ndctl list -D
  [
    {
      "dev":"nmem1",
      "id":"8680-57341200",
      "handle":2,
      "phys_id":0
    },
    {
      "dev":"nmem0",
      "id":"8680-56341200",
      "handle":1,
      "phys_id":0
    }
  ]

  rafaeldtinoco@ndctltest:~$ sudo ndctl list -R
  [
    {
      "dev":"region1",
      "size":1073610752,
      "available_size":1073610752,
      "max_available_extent":1073610752,
      "type":"pmem",
      "iset_id":52512795602891997,
      "persistence_domain":"unknown"
    },
    {
      "dev":"region0",
      "size":1073610752,
      "available_size":0,
      "max_available_extent":0,
      "type":"pmem",
      "iset_id":52512752653219036,
      "persistence_domain":"unknown"
    }
  ]

  Now, whenever trying to access namespace1.0 (from region1/nmem1/ndbus)
  I get:

  [11257.782659] nd_pmem: probe of pfn1.0 failed with error -22
  [11332.001388] pfn0.0 initialised, 257024 pages in 8ms
  [11332.001818] pmem0: detected capacity change from 0 to 1052770304
  [11359.739280] pfn0.1 initialised, 257024 pages in 0ms
  [11362.643212] pfn0.0 initialised, 257024 pages in 0ms
  [11362.644225] pmem0: detected capacity change from 0 to 1052770304
  [11406.230365] pfn0.1 initialised, 257024 pages in 0ms
  [11406.231281] pmem0: detected capacity change from 0 to 1052770304
  [11517.785147] pfn0.0 initialised, 257024 pages in 4ms
  [11517.785593] pmem0: detected capacity change from 0 to 1052770304
  [11537.431697] pfn0.1 initialised, 257024 pages in 0ms
  [11537.432256] pmem0: detected capacity change from 0 to 1052770304
  [11627.965947] pfn0.0 initialised, 257024 pages in 0ms
  [11627.966415] pmem0: detected capacity change from 0 to 1052770304
  [11653.277667] pfn0.1 initialised, 257024 pages in 4ms
  [11653.278086] pmem0: detected capacity change from 0 to 1052770304
  [11708.696361] pfn0.0 initialised, 257024 pages in 0ms
  [11708.697617] pmem0: detected capacity change from 0 to 1052770304
  [11753.621295] nd_pmem btt0.0: No existing arenas
  [11753.623118] pmem0s: detected capacity change from 0 to 1071484928
  [11767.087424] pfn0.1 initialised, 257024 pages in 4ms
  [11767.088272] pmem0: detected capacity change from 0 to 1052770304
  [11775.815396] dax0.0 initialised, 257024 pages in 4ms
  [12848.341346] pfn0.0 initialised, 257024 pages in 0ms
  [12848.341785] pmem0: detected capacity change from 0 to 1052770304
  [12851.897716] nd_pmem: probe of pfn1.0 failed with error -22
  [13023.693246] pfn0.1 initialised, 257024 pages in 0ms
  [13023.693662] pmem0: detected capacity change from 0 to 1052770304
  [13026.517467] nd_pmem: probe of pfn1.0 failed with error -22
  [13067.380701] pmem0: detected capacity change from 0 to 1073610752
  [13117.568499] nd_pmem: probe of pfn1.0 failed with error -22
  [13946.604199] pfn0.0 initialised, 257024 pages in 0ms
  [13946.604777] pmem0: detected capacity change from 0 to 1052770304
  [13957.948381] nd_pmem: probe of pfn1.0 failed with error -22

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855177/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to