ohio,

i think i got it. thanks to your kind vm contribution Santiago! <3

i put some extra debug output in the code, which showed me that there was
files in the state (stored under /tmp/sphinx-oracle-root.*) that should not
exist. this state is deleted recursively between test cases. and the error in
this bug is that there was a file where there shouldn't have been one, since
it was deleted before the testcase is called.

what a weird race condition? is it possible that files don't get deleted
immediately? and i can ready something after it got deleted? that sounded
weird. but then, i figured, maybe the files gets written after the deletion.

for a little context, the test starts 5 servers in the background, which the
test cases communicate with. and these servers write the state in
/tmp/sphinx-oracle-root.*/servers/[01234]/data/*
and then i looked, and saw the vm has only 2 cpus. could it be, that some of
the servers are not quick enough to finish their writing before the test-case
cleanup?

i fired up `stress(1) -c2 &` and what before was non-deterministic triggering
of the bug, became very much deterministic. so i put a sleep(0.1) at the start
of the testcase cleanup, and while still having `stress(1)` running, tried
again. and it all seemed to be working.

then i got bold, and reenabled all the other disabled testcases that also
mysteriously and irreproducably were failing randomly,
(see list here:
https://salsa.debian.org/debian/pwdsphinx/-/blob/debian/1.99.2-beta-5/debian/patches/0006-disable-some-tests.patch?ref_type=tags
)

and without the sleep they are failing badly and the same manner as why we
were disabling them (which we could never reproduce locally), but with the
sleep before the clean they also seem to be succeeding consistently.

so i suppose not only have we fixed this bug in the tests that was failing due
to the environment (and not the test subject) but we also figured out why the
other tests were failing, which we can now reenable \o/

i'll make a new release asap (we also got the last two features done for
v2.0), so i'll tag a v2.0-rc1 in the next 24h if all goes well.

thanks a lot for this valuable contribution Santiago!
s

Reply via email to