Hi All,

On 01/01/2026 14:52, Samuel Thibault wrote:
Michael Kelly, le jeu. 01 janv. 2026 14:48:19 +0000, a ecrit:
     It's hard to say. The 4 buildds keep building packages all day long, and
     I notice such "stray" errors on one of them like every one day or two.

That's possibly as rare as the stress-ng bit errors then given that my machine
is almost certainly slower than those supporting the buildds.
It may then be simpler to just reproduce it with stress-ng, since then
you'd know exactly what it was doing, while package installation etc. is
a mess of things that happen :)

Samuel

I'm making an update on this investigation because like others I'm likely to have less time for looking into this from tomorrow.

I have been successful at adjusting the stress-ng parameters to make the likelihood of 'bit error' reports close to 100%. A test like the following on a 4GB hurd-amd64 virtual machine and also on a 4GB real hardware fails for me almost every time:

# stress-ng -t 20s --metrics --vm 64 --vm-bytes 1800M --vm-method incdec

With errors like:

stress-ng: fail:  [3947] vm: detected 141733920769 bit errors while stressing memory
stress-ng: fail:  [3984] vm: detected 2 bit errors while stressing memory

That's the good news. The bad news is that I suspect the cause is related to the handling of the signals which are used to terminate the stress-ng worker (oomable child). That first error reported (above) has a value which is nonsense given the size of memory region being worked on. I added some debug to the stress-ng code and there were some extraordinary things going on which made no sense at all with stack variables seemingly changing 'randomly'. It seems suspicious to me that these things only start occurring after the first signal is delivered to the process. This all needs a thorough investigation when time permits.

In any case, this same test result does not present when running on hurd-i386. That test completes perfectly over many 10s of iterations. This indicates that the stress-ng bit errors are not related to the buildd issues. I've had no luck recreating that issue but will return to it when time permits.

Regards,

Mike.


Reply via email to