On 2025-10-21 14:26, Alison Schofield wrote: > The pmem_ns unit test frequently fails when run as part of the full > suite, yet passes when executed alone. > > [...] > > Replace the NULL context parameter when calling ndctl_test_init() > with the available ndctl_ctx to ensure pmem_ns can find usable PMEM > regions. > > Reported-by: Marc Herbert <[email protected]> > Closes: https://github.com/pmem/ndctl/issues/290 > Signed-off-by: Alison Schofield <[email protected]> > --- > test/pmem_namespaces.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/test/pmem_namespaces.c b/test/pmem_namespaces.c > index 4bafff5164c8..7b8de9dcb61d 100644 > --- a/test/pmem_namespaces.c > +++ b/test/pmem_namespaces.c > @@ -191,7 +191,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test > *test, > > if (!bus) { > fprintf(stderr, "ACPI.NFIT unavailable falling back to > nfit_test\n"); > - rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test); > + rc = ndctl_test_init(&kmod_ctx, &mod, ctx, log_level, test); > ndctl_invalidate(ctx); > bus = ndctl_bus_get_by_provider(ctx, "nfit_test.0"); > if (rc < 0 || !bus) {
Thanks Alison! This does fix the crash, so you can also add my Tested-By:! But to test, I had to combine this fix with this temporary hack from https://github.com/pmem/ndctl/issues/290 --- a/test/pmem_namespaces.c +++ b/test/pmem_namespaces.c @@ -189,7 +189,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test *test, bus = NULL; } - if (!bus) { + if (!bus || true) { fprintf(stderr, "ACPI.NFIT unavailable falling back to nfit_test\n"); rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test); ndctl_invalidate(ctx); ... which explains why I disagree with... the commit message! I don't think this necessary fix "closes" https://github.com/pmem/ndctl/issues/290 entirely. This fix does stop the test from failing which is great and it lowers dramatically the severity of 290. But we still don't know why ACPI.NFIT is "available" most of the time and... sometimes not. In other words, we still don't know why this test is non-deterministic. Of course, there will always be some non-determinism because the kernel and QEMU are too complex to be deterministic but I don't think non-determism should extend to test fixtures and test code themselves like this. Why 290 should stay open IMHO. Also, this feels like a (missed?) opportunity to add better logging of this non-determinism, I mean stuff like: https://github.com/pmem/ndctl/issues/290#issuecomment-3260168362 This is test code, it should not be mean with logging. All bash scripts run with "set -x" already so this would not make much difference to the total volume. Generally speaking, tests should follow a CLEAN - TEST - CLEAN logic to minimize interferences; as much as time allows[*]. Bug 290 demonstrates that: 1. Some unknown test running before pmem-ns does not clean properly after itself, and 2. The pmem-ns test is not capable of creating a deterministic setup for itself. We still have no clue about 1. and 2. is not mitigated with logs and source comments. So there's still an open bug there. Marc [*] there are practical limits: rebooting QEMU for each test would be too slow.
