Hi,

On 08-04-2024 3:51 a.m., 陈 晟祺 wrote:
With resources limited to one CPU (AMD EPYC 7551) and 2G memory,
my local test could now reproduce the test hang and following time out error.

Ouch.

I think it is caused by insufficient resources (e.g. OOM killer, but I am not 
sure).
Even we can work it around, the test process would be still be too slow to 
finish.

Is it possible to allocate more resources for the test? For reference, openzfs 
uses
GitHub-hosted workflow runners [1] for test. Each runner has 2 CPU cores and
7 GB memory, under which configuration the whole test still takes ~4hrs.

Our timeout is 10000 seconds, so 2.47 hours, per autopkgtest stanza (overall it's 8 hours). If the test is going to take longer, it will fail anyways. So maybe it was just still running? I'm a bit hesitant, particularly about the memory to make much bigger VM's, because most tests don't need it and it limits the amount of VM's we can make. We need to strike a nice balance (or fix https://salsa.debian.org/ci-team/debci/-/issues/166#note_451831 and add zfs-linux to a "huge" list)

If not, is there any way to mark the test as optional (thus not causing RC bug)?
Otherwise our worst choice would be disable the test completely.

Well, if we can't run the test on our infra, we could disable it, but what's the point of having the autopkgtest then? (If you split the tests over multiple stanza, you get the 2.47 hour per set. Does that help?)

Let me try to see if I can have debci create larger VM's for us and let me try your package again. What are the resources you use yourself for the test and how long does it take in that case?

Paul

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to