On Wed, Oct 22, 2025 at 11:37:37AM -0700, Marc Herbert wrote:
> On 2025-10-21 14:26, Alison Schofield wrote:
> > The pmem_ns unit test frequently fails when run as part of the full
> > suite, yet passes when executed alone.
> >
> > [...]
> > > Replace the NULL context parameter when calling ndctl_test_init()
> > with the available ndctl_ctx to ensure pmem_ns can find usable PMEM
> > regions.
> >
> > Reported-by: Marc Herbert <[email protected]>
> > Closes: https://github.com/pmem/ndctl/issues/290
> > Signed-off-by: Alison Schofield <[email protected]>
> > ---
> > test/pmem_namespaces.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/test/pmem_namespaces.c b/test/pmem_namespaces.c
> > index 4bafff5164c8..7b8de9dcb61d 100644
> > --- a/test/pmem_namespaces.c
> > +++ b/test/pmem_namespaces.c
> > @@ -191,7 +191,7 @@ int test_pmem_namespaces(int log_level, struct
> > ndctl_test *test,
> >
> > if (!bus) {
> > fprintf(stderr, "ACPI.NFIT unavailable falling back to
> > nfit_test\n");
> > - rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test);
> > + rc = ndctl_test_init(&kmod_ctx, &mod, ctx, log_level, test);
> > ndctl_invalidate(ctx);
> > bus = ndctl_bus_get_by_provider(ctx, "nfit_test.0");
> > if (rc < 0 || !bus) {
>
> Thanks Alison! This does fix the crash, so you can also add my Tested-By:!
>
> But to test, I had to combine this fix with this temporary hack from
> https://github.com/pmem/ndctl/issues/290
Ah, yes I did similar to debug and test.
>
> --- a/test/pmem_namespaces.c
> +++ b/test/pmem_namespaces.c
> @@ -189,7 +189,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test
> *test,
> bus = NULL;
> }
>
> - if (!bus) {
> + if (!bus || true) {
> fprintf(stderr, "ACPI.NFIT unavailable falling back to
> nfit_test\n");
> rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test);
> ndctl_invalidate(ctx);
>
>
>
> ... which explains why I disagree with... the commit message! I don't think
> this necessary fix "closes" https://github.com/pmem/ndctl/issues/290 entirely.
Marc,
Thanks for the review!
Ah, you disagree with the Closes tag? I added the close tag expecting
the test case will now pass. pmem-ns will successfully fallback to
nfit_test region if ACPI.NFIT is not present or does not have the pmem
capable region.
wrt the reason why ACPI.NFIT fails to find a suitable region, I haven't
given up on it. In my setup, it fails because the region type is
ND_DEVICE_NAMESPACE_IO (4) rather than ND_DEVICE_NAMESPACE_PMEM (5)
wrt why it fails in your case, a full test run after boot, and with
my reproducer (simply run pmem-ns alone). I don't have the soln yet.
If you have time to check that your failure is same as with my
reproducer, you can collect and share this:
ND_DEVICE_NAMESPACE_IO is 4
ND_DEVICE_NAMESPACE_PMEM is 5
diff --git a/test/pmem_namespaces.c b/test/pmem_namespaces.c
index 4bafff5164c8..c2f25bb02025 100644
--- a/test/pmem_namespaces.c
+++ b/test/pmem_namespaces.c
@@ -180,11 +180,15 @@ int test_pmem_namespaces(int log_level, struct ndctl_test
*test,
bus = ndctl_bus_get_by_provider(ctx, "ACPI.NFIT");
if (bus) {
+ int nstype;
+
/* skip this bus if no label-enabled PMEM regions */
- ndctl_region_foreach(bus, region)
- if (ndctl_region_get_nstype(region)
- == ND_DEVICE_NAMESPACE_PMEM)
+ ndctl_region_foreach(bus, region) {
+ nstype = ndctl_region_get_nstype(region);
+ fprintf(stderr, "ALISON nstype %d\n", nstype);
+ if (nstype == ND_DEVICE_NAMESPACE_PMEM)
break;
+ }
if (!region)
bus = NULL;
}
>
> This fix does stop the test from failing which is great and it lowers
> dramatically
> the severity of 290. But we still don't know why ACPI.NFIT is "available"
> most of
> the time and... sometimes not. In other words, we still don't know why this
> test is
> non-deterministic. Of course, there will always be some non-determinism
> because
> the kernel and QEMU are too complex to be deterministic but I don't think
> non-determism should extend to test fixtures and test code themselves like
> this.
> Why 290 should stay open IMHO.
>
> Also, this feels like a (missed?) opportunity to add better logging of this
> non-determinism, I mean stuff like:
> https://github.com/pmem/ndctl/issues/290#issuecomment-3260168362
> This is test code, it should not be mean with logging. All bash scripts run
> with "set -x" already so this would not make much difference to the total
> volume.
>
>
> Generally speaking, tests should follow a CLEAN - TEST - CLEAN logic to
> minimize interferences; as much as time allows[*]. Bug 290 demonstrates that:
> 1. Some unknown test running before pmem-ns does not clean properly after
> itself, and
> 2. The pmem-ns test is not capable of creating a deterministic setup for
> itself.
>
> We still have no clue about 1. and 2. is not mitigated with logs
> and source comments. So there's still an open bug there.
>
> Marc
>
>
>
>
> [*] there are practical limits: rebooting QEMU for each test would be too
> slow.