On 2025-10-21 14:26, Alison Schofield wrote:
> The pmem_ns unit test frequently fails when run as part of the full
> suite, yet passes when executed alone.
> 
> [...]
> > Replace the NULL context parameter when calling ndctl_test_init()
> with the available ndctl_ctx to ensure pmem_ns can find usable PMEM
> regions.
> 
> Reported-by: Marc Herbert <[email protected]>
> Closes: https://github.com/pmem/ndctl/issues/290
> Signed-off-by: Alison Schofield <[email protected]>
> ---
>  test/pmem_namespaces.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/test/pmem_namespaces.c b/test/pmem_namespaces.c
> index 4bafff5164c8..7b8de9dcb61d 100644
> --- a/test/pmem_namespaces.c
> +++ b/test/pmem_namespaces.c
> @@ -191,7 +191,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test 
> *test,
>  
>       if (!bus) {
>               fprintf(stderr, "ACPI.NFIT unavailable falling back to 
> nfit_test\n");
> -             rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test);
> +             rc = ndctl_test_init(&kmod_ctx, &mod, ctx, log_level, test);
>               ndctl_invalidate(ctx);
>               bus = ndctl_bus_get_by_provider(ctx, "nfit_test.0");
>               if (rc < 0 || !bus) {

Thanks Alison! This does fix the crash, so you can also add my Tested-By:!

But to test, I had to combine this fix with this temporary hack from
https://github.com/pmem/ndctl/issues/290

--- a/test/pmem_namespaces.c
+++ b/test/pmem_namespaces.c
@@ -189,7 +189,7 @@ int test_pmem_namespaces(int log_level, struct ndctl_test 
*test,
                        bus = NULL;
        }
 
-       if (!bus) {
+       if (!bus || true) {
                fprintf(stderr, "ACPI.NFIT unavailable falling back to 
nfit_test\n");
                rc = ndctl_test_init(&kmod_ctx, &mod, NULL, log_level, test);
                ndctl_invalidate(ctx);



... which explains why I disagree with... the commit message! I don't think
this necessary fix "closes" https://github.com/pmem/ndctl/issues/290 entirely.

This fix does stop  the test from failing which is great and it lowers 
dramatically
the severity of 290. But we still don't know why ACPI.NFIT is "available" most 
of
the time and... sometimes not. In other words, we still don't know why this 
test is
non-deterministic. Of course, there will always be some non-determinism because
the kernel and QEMU are too complex to be deterministic but I don't think
non-determism should extend to test fixtures and test code themselves like this.
Why 290 should stay open IMHO.

Also, this feels like a (missed?) opportunity to add better logging of this
non-determinism, I mean stuff like:
https://github.com/pmem/ndctl/issues/290#issuecomment-3260168362
This is test code, it should not be mean with logging. All bash scripts run
with "set -x" already so this would not make much difference to the total
volume.


Generally speaking, tests should follow a CLEAN - TEST - CLEAN logic to
minimize interferences; as much as time allows[*]. Bug 290 demonstrates that:
1. Some unknown test running before pmem-ns does not clean properly after 
itself, and
2. The pmem-ns test is not capable of creating a deterministic setup for itself.

We still have no clue about 1. and 2. is not mitigated with logs
and source comments. So there's still an open bug there.

Marc




[*] there are practical limits: rebooting QEMU for each test would be too slow.

Reply via email to