On 26/07/2024 21:37, Ira Weiny wrote: > Li Zhijian wrote: >> The leakage would happend when create_namespace_pmem() meets an invalid >> label which gets failure in validating isetcookie. >> >> Try to resuse the devs that may have already been allocated with size >> (2 * sizeof(dev)) previously. >> >> A kmemleak reports: >> unreferenced object 0xffff88800dda1980 (size 16): >> comm "kworker/u10:5", pid 69, jiffies 4294671781 >> hex dump (first 16 bytes): >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> backtrace (crc 0): >> [<00000000c5dea560>] __kmalloc+0x32c/0x470 >> [<000000009ed43c83>] nd_region_register_namespaces+0x6fb/0x1120 >> [libnvdimm] >> [<000000000e07a65c>] nd_region_probe+0xfe/0x210 [libnvdimm] >> [<000000007b79ce5f>] nvdimm_bus_probe+0x7a/0x1e0 [libnvdimm] >> [<00000000a5f3da2e>] really_probe+0xc6/0x390 >> [<00000000129e2a69>] __driver_probe_device+0x78/0x150 >> [<000000002dfed28b>] driver_probe_device+0x1e/0x90 >> [<00000000e7048de2>] __device_attach_driver+0x85/0x110 >> [<0000000032dca295>] bus_for_each_drv+0x85/0xe0 >> [<00000000391c5a7d>] __device_attach+0xbe/0x1e0 >> [<0000000026dabec0>] bus_probe_device+0x94/0xb0 >> [<00000000c590d936>] device_add+0x656/0x870 >> [<000000003d69bfaa>] nd_async_device_register+0xe/0x50 [libnvdimm] >> [<000000003f4c52a4>] async_run_entry_fn+0x2e/0x110 >> [<00000000e201f4b0>] process_one_work+0x1ee/0x600 >> [<000000006d90d5a9>] worker_thread+0x183/0x350 >> >> Cc: Dave Jiang <dave.ji...@intel.com> >> Cc: Ira Weiny <ira.we...@intel.com> >> Fixes: 1b40e09a1232 ("libnvdimm: blk labels and namespace instantiation") > > What is the bigger effect the user will see?
*Users* cannot use this pmem until they zero-label the device. In my understanding, once the leakage occurs, there is no way to reclaim the memory until reboot > > Does this cause a long term user effect? For example, if a users system > has a bad label I think this is going to be a pretty minor memory leak > which just hangs around until reboot, correct? > >> Signed-off-by: Li Zhijian <lizhij...@fujitsu.com> >> --- >> >> Cc: Ira Weiny <ira.we...@intel.com> >>> From what I can tell create_namespace_pmem() must be returning EAGAIN >>> which leaves devs allocated but fails to increment count. Thus there are >>> no valid labels but devs was not free'ed. >> >>> Can you trace the error you are seeing a bit more to see if this is the >>> case? >> Hi Ira, Sorry for the late reply. I have reproduced it these days. >> Yeah, the LSA is containing a label in which the isetcookie is invalid. > > NP, sometimes it takes a while to really debug something. > >> >> V2: >> update description and comment >> --- >> drivers/nvdimm/namespace_devs.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/nvdimm/namespace_devs.c >> b/drivers/nvdimm/namespace_devs.c >> index d6d558f94d6b..28c9afc01dca 100644 >> --- a/drivers/nvdimm/namespace_devs.c >> +++ b/drivers/nvdimm/namespace_devs.c >> @@ -1994,7 +1994,13 @@ static struct device **scan_labels(struct nd_region >> *nd_region) >> /* Publish a zero-sized namespace for userspace to configure. */ >> nd_mapping_free_labels(nd_mapping); >> >> - devs = kcalloc(2, sizeof(dev), GFP_KERNEL); >> + /* >> + * Try to use the devs that may have already been allocated >> + * above first. This would happend when create_namespace_pmem() >> + * meets an invalid label. >> + */ >> + if (!devs) >> + devs = kcalloc(2, sizeof(dev), GFP_KERNEL); > > I'm still tempted to try and fix the count but I think this will work. I cannot get your *fix the count* ? Does "fix the count*" means to free the devs in the case of error cases: $ git diff diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c index 28c9afc01dca..3fae00a05ad7 100644 --- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -1970,6 +1970,10 @@ static struct device **scan_labels(struct nd_region *nd_region) dev = create_namespace_pmem(nd_region, nd_mapping, nd_label); if (IS_ERR(dev)) { + if (!count) { + kfree(devs); + devs = NULL; + } > Let me know about the severity of the issue. > > Ira > >> if (!devs) >> goto err; >> >> -- >> 2.29.2 >> > > >