Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-05-19 Thread Paul Gevers

Hi,

On 19-05-2024 2:55 p.m., 陈 晟祺 wrote:

My concern now is that the results do not seem to be stable or reproducible.


That's an reoccurring problem in lots of places, yes.


Is there any convention in handling such situation? E.g., should I mark all 
zfs-test-suite-x
as flaky and treat them as reference only?


It depends ;)

The disadvantage of marking the whole test stanza as flaky means that it 
won't block regressions at all. Depending on how the test (I mean per 
stanza in d/t/control) is set up, it makes more sense to mark individual 
tests as flaky then the whole suite/stanza. However, if there's not 
enough granularity, that doesn't really help.


Then there's the infrastructure argument. If your test is not a cheap 
one, running a long test only to fail flaky is a rather high price for 
very little gain. Then it might make more sense to not run the test by 
default (add a unknown restriction for example) and only use the test 
for manual checking, where you can judge (or rerun) the test as you 
judge fit.


In the end it's your decision. All I can say is that tests that are 
flaky enough (my level is roughly worse than 1/8) and not marked as such 
are considered RC buggy.


Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-05-19 Thread 陈 晟祺
Hi,

> 2024年5月19日 13:51,Paul Gevers  写道:
> 
> I already noticed yesterday and had it run; it failed. (Currently) top one 
> here: https://ci.debian.net/packages/z/zfs-linux/testing/amd64/
> 

My concern now is that the results do not seem to be stable or reproducible.

Seven tests have been run since yesterday, of which two failed on 
zfs-test-suite-foo.
Most of these failures may be false positive, and a re-run typically makes them 
pass.
Almost every single test in the failed list can pass independently, but the 
whole test
sequence is failure-prone. 

I have also observed similar behavior when testing locally (and also in 
upstream CI).
Is there any convention in handling such situation? E.g., should I mark all 
zfs-test-suite-x
as flaky and treat them as reference only?

Thanks,
Shengqi Chen



Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-05-18 Thread Paul Gevers

Hi

On 19-05-2024 6:25 a.m., 陈 晟祺 wrote:

I have made more adjustments, basically skipping some flaky tests in VM.
Now new version 2.2.4-1 is in the archive, please try that again when available.


I already noticed yesterday and had it run; it failed. (Currently) top 
one here: https://ci.debian.net/packages/z/zfs-linux/testing/amd64/


Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-05-18 Thread 陈 晟祺
Hi,

> 
> The test ran. Unfortunately zfs-test-suite-1 failed.
> 

I have made more adjustments, basically skipping some flaky tests in VM.
Now new version 2.2.4-1 is in the archive, please try that again when available.

Thanks,
Shengqi Chen



Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-21 Thread Paul Gevers

Hi,

On 18-04-2024 10:25 p.m., Paul Gevers wrote:
I'll hopefully do the changes tomorrow. (RL work is a bit busy at the 
moment.)


The test ran. Unfortunately zfs-test-suite-1 failed.

https://ci.debian.net/packages/z/zfs-linux/unstable/amd64/45683824/

4089s Results Summary
4089s PASS   681
4089s FAIL 2
4089s SKIP 3

Seems like we're nearly there.

(I made a tiny mistake in that run, as I had 8GB RAM; I have now lowered 
it to 4GB which will be the setting until further discussion is warranted).


Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-18 Thread Paul Gevers

Hi,

On 14-04-2024 5:14 a.m., 陈 晟祺 wrote:

I would have aron to review & upload a new version, then we can test on
debci infra and see whether it solves the problem.


I forgot I promised changes to the settings. Without those changes, it 
doesn't end nicely:


https://ci.debian.net/packages/z/zfs-linux/unstable/amd64/45540021/

I'll hopefully do the changes tomorrow. (RL work is a bit busy at the 
moment.)


Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-14 Thread Paul Gevers

Hi,

On 14-04-2024 5:14 a.m., 陈 晟祺 wrote:

When using non-ramdisk tmpdir (/var/tmp) and some large tests skipped [1],
the tests would run with 2 core + 4GB memory + ~10GB disk space.
I also tried 2GB / 3GB, and both will be interrupted by OOM killer.


So, let's settle on 2+4 for now. That sounds like a value we could very 
reasonably support. I'll configure our setup for that.



I would have aron to review & upload a new version, then we can test on
debci infra and see whether it solves the problem.


Ack.

Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-13 Thread 陈 晟祺
Control: tags -1 + pending

Hi,

> 2024年4月13日 01:29,陈 晟祺  写道:
> 
> I am now trying to run tests on 2 core and 4GB memory (and maybe less later).
> If the tester itself does not occupy too much RAM, the real requirement for 
> resources
> is now probably several gigabytes of disk space (currently it’s ~10GB).
> 
> I will give more feedback once new results come out.
> 

When using non-ramdisk tmpdir (/var/tmp) and some large tests skipped [1],
the tests would run with 2 core + 4GB memory + ~10GB disk space.
I also tried 2GB / 3GB, and both will be interrupted by OOM killer.

I would have aron to review & upload a new version, then we can test on
debci infra and see whether it solves the problem.

[1]: 
https://salsa.debian.org/zfsonlinux-team/zfs/-/commit/cf8e8afe69a0a8f21768415a08b131f8aa9fdc6a

Thanks,
--
Shengqi Chen



Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-12 Thread 陈 晟祺
Hi,

> 2024年4月12日 12:48,Paul Gevers  写道:
> 
> Hi,
> 
> On 12-04-2024 4:42 a.m., 陈 晟祺 wrote:
>> - If I limit the test file size to 1G, quite many tests would fail even with 
>> adequate resources
> 
> Ack. To be fair, I was more thinking to make current test conditional on the 
> available free disk space. But yeah, that might also lead to issues as the 
> test might be randomly skipped.
> 

You got the point. I previously thought that testifies are on disk,
but actually they are in tmpfs and consuming huge memory.
That’s why OOM killer would kick in when writing large files in tests.

> Good, so 2GB memory is not enough for zfs-linux (I assume you ran this test 
> with 2 cores like I did)

Yes, I always use 2 cores. 

> 
> I agree we shouldn't spend too much time on squeezing it into the *current* 
> defaults. I'm still somewhat hoping that we could squeeze out a somewhat 
> smaller memory defaults than 8 GB: does 4 GB work (and if so, how long does 
> it take)?
> 

I am now trying to run tests on 2 core and 4GB memory (and maybe less later).
If the tester itself does not occupy too much RAM, the real requirement for 
resources
is now probably several gigabytes of disk space (currently it’s ~10GB).

I will give more feedback once new results come out.

Thanks,
--
Shengqi Chen



Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-11 Thread Paul Gevers

Hi,

On 12-04-2024 4:42 a.m., 陈 晟祺 wrote:

- If I limit the test file size to 1G, quite many tests would fail even with 
adequate resources


Ack. To be fair, I was more thinking to make current test conditional on 
the available free disk space. But yeah, that might also lead to issues 
as the test might be randomly skipped.



- If I try to skip large_files as you indicated with 2G memory,


Good, so 2GB memory is not enough for zfs-linux (I assume you ran this 
test with 2 cores like I did)



- With my fixes to dependencies, the tests could run to the ending without 
errors on 2 core + 8 GB.


Great. That's progress than.


Therefore I think trying to fit zfs-tests into a normal debci VM might be 
troublesome.


I agree we shouldn't spend too much time on squeezing it into the 
*current* defaults. I'm still somewhat hoping that we could squeeze out 
a somewhat smaller memory defaults than 8 GB: does 4 GB work (and if so, 
how long does it take)?


Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-11 Thread 陈 晟祺
Hi,

> 2024年4月12日 02:39,Paul Gevers  写道:
> 
> Hi
> 
> On 11-04-2024 5:18 p.m., 陈 晟祺 wrote:
>> If possible, could you help to build with latest code on salsa then run 
>> autopkgtest again on a normal debci VM?
> 
> As I'm doing this live on the infrastructure, I don't want to do anything 
> there except testing what's in the archive, sorry.
> 

Sure, this is reasonable.

> My private setup (laptop) is not powerful enough to run this.
> 
> I'm not 100% percent sure how to instruct you to build a ci.d.n like image. I 
> think it's:
> $ autopkgtest-build-qemu debian testing
> $ /usr/bin/autopkgtest --no-built-binaries --test-name=zfs-test-suite --user 
> debci zfs-linux -- qemu 
> except I don't know where autopkgtest-build-qemu stores the image.
> 

I am indeed using debci images to ensure reproducibility. So the software 
environment should be the same.

Just more observations here:

- If I limit the test file size to 1G, quite many tests would fail even with 
adequate resources.
- If I try to skip large_files as you indicated with 2G memory, the tests could 
proceed for a bit longer,
  but still got hang on some later tests. Since there are so many tests and I 
am not familiar with most of them,
  I have to try it repeatedly to find out which to filter out. Even I could do 
so, some (other, not seen before)
  tests would fail unexpectedly. These problems might be hard to workaround.
- With my fixes to dependencies, the tests could run to the ending without 
errors on 2 core + 8 GB.

Therefore I think trying to fit zfs-tests into a normal debci VM might be 
troublesome.

--
Thanks,
Shengqi Chen

Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-11 Thread Paul Gevers

Hi

On 11-04-2024 5:18 p.m., 陈 晟祺 wrote:
If possible, could you help to build with latest code on salsa then run 
autopkgtest again on a normal debci VM?


As I'm doing this live on the infrastructure, I don't want to do 
anything there except testing what's in the archive, sorry.


My private setup (laptop) is not powerful enough to run this.

I'm not 100% percent sure how to instruct you to build a ci.d.n like 
image. I think it's:

$ autopkgtest-build-qemu debian testing
$ /usr/bin/autopkgtest --no-built-binaries --test-name=zfs-test-suite 
--user debci zfs-linux -- qemu 

except I don't know where autopkgtest-build-qemu stores the image.

Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-11 Thread 陈 晟祺
Hi Paul,

2024年4月11日 20:59,Paul Gevers  写道:

Hi,

With the default size of the ramdisk and 2 cpu's the test crashes with:

Test: /usr/share/zfs/zfs-tests/tests/functional/large_files/setup (run as root) 
[00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/large_files/large_files_001_pos 
(run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/large_files/large_files_002_pos 
(run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/large_files/cleanup (run as 
root) [00:00] [PASS]
Killed
Killed
Killed
qemu-system-x86_64: terminating on signal 15 from pid 132251 (/usr/bin/python3)
autopkgtest [12:28:46]: ERROR: testbed failure: timed out on command "cat 
/run/autopkgtest-reboot-mark" (kind: short)
root@ci-worker13:~#

That at least hints that those tests *might* be generating a bit too large 
files to be handled in this case. Maybe worth making these tests conditional on 
free space if they aren't already.


Thanks for your detailed diagnosis. I adjusted a test option to limit the 
maximum file size [1].
Also I fixed numerous test errors caused by missing dependencies [2]. Yet I am 
concerned that some
tests might fail, in turn, due to insufficient disk space. If so I will have to 
ignore some tests on either side.

If possible, could you help to build with latest code on salsa then run 
autopkgtest again on a normal debci VM?
I am also testing that locally.

[1]: 
https://salsa.debian.org/zfsonlinux-team/zfs/-/commit/f6bea9224c4bf734ac381bac36a995dfd33b2078
[2]: 
https://salsa.debian.org/zfsonlinux-team/zfs/-/commit/177d5b2eab39cf8ca0e7bb66d462b4886f2372e4


Thanks,
Shengqi Chen



Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-11 Thread Paul Gevers

Hi,

With the default size of the ramdisk and 2 cpu's the test crashes with:

Test: /usr/share/zfs/zfs-tests/tests/functional/large_files/setup (run 
as root) [00:00] [PASS]
Test: 
/usr/share/zfs/zfs-tests/tests/functional/large_files/large_files_001_pos 
(run as root) [00:00] [PASS]
Test: 
/usr/share/zfs/zfs-tests/tests/functional/large_files/large_files_002_pos 
(run as root) [00:00] [PASS]
Test: /usr/share/zfs/zfs-tests/tests/functional/large_files/cleanup (run 
as root) [00:00] [PASS]

Killed
Killed
Killed
qemu-system-x86_64: terminating on signal 15 from pid 132251 
(/usr/bin/python3)
autopkgtest [12:28:46]: ERROR: testbed failure: timed out on command 
"cat /run/autopkgtest-reboot-mark" (kind: short)

root@ci-worker13:~#

That at least hints that those tests *might* be generating a bit too 
large files to be handled in this case. Maybe worth making these tests 
conditional on free space if they aren't already.


Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-11 Thread Paul Gevers

Hi,

Some additional info from my side.

I have just run the following:
root@ci-worker13:~# /usr/bin/autopkgtest --no-built-binaries 
--test-name=zfs-test-suite --timeout-factor=3 --user debci zfs-linux -- 
qemu --cpus=2 --ram-size=8192 /var/lib/debci/qemu/testing-amd64.img


The test failed and took 3 hours and 13 minutes:

The test started at 06:32:45
The test ends like:
SKIP cli_root/zfs_unshare/zfs_unshare_005_neg (expected PASS)
SKIP cli_root/zfs_unshare/zfs_unshare_007_pos (expected PASS)
SKIP cli_root/zfs_unshare/zfs_unshare_008_pos (expected PASS)
FAIL cli_root/zpool_destroy/zpool_destroy_002_pos (expected PASS)
FAIL cli_root/zpool_detach/setup (expected PASS)
SKIP cli_root/zpool_detach/zpool_detach_001_neg (expected PASS)
FAIL cli_root/zpool_import/zpool_import_012_pos (expected PASS)
FAIL cli_root/zpool_import/zpool_import_rename_001_pos (expected PASS)
FAIL history/history_007_pos (expected PASS)
FAIL inheritance/inherit_001_pos (expected PASS)
FAIL slog/slog_replay_fs_001 (expected PASS)
FAIL slog/slog_replay_fs_002 (expected PASS)
autopkgtest [09:45:26]: test zfs-test-suite: ---]
autopkgtest [09:45:27]: test zfs-test-suite:  - - - - - - - - - - 
results - - - - - - - - - -

zfs-test-suite   FAIL non-zero exit status 1

Now I wonder if it's the number of cpu's or the memory that caused it to 
finish starting up a new test.


Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-11 Thread 陈 晟祺
Hi,

Some additional information and errata here.

I have split the tests into four stanzas as upstream does [1].

The resources of one GitHub Action runner is actually 4 cores + 16GB memory, not
2 cores + 8GB as I mentioned before. The test could finish within reasonable 
time (3hrs)
with such configuration (although with a few unexpected failures, but I think 
it could be solved).

I am still trying with fewer resources, especially shrinking the memory.

[1]: 
https://salsa.debian.org/zfsonlinux-team/zfs/-/commit/7afeda495fa5b8129dfac45aef6340f46fbaf3a6

--
Thanks, 
Shengqi Chen.



Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: Bug#1068559: Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-08 Thread 陈 晟祺
Hi,

> 2024年4月9日 02:51,Paul Gevers  写道:
> 
> Our timeout is 1 seconds, so 2.47 hours, per autopkgtest stanza (overall 
> it's 8 hours). If the test is going to take longer, it will fail anyways. So 
> maybe it was just still running? I'm a bit hesitant, particularly about the 
> memory to make much bigger VM's, because most tests don't need it and it 
> limits the amount of VM's we can make. We need to strike a nice balance (or 
> fix https://salsa.debian.org/ci-team/debci/-/issues/166#note_451831 and add 
> zfs-linux to a "huge" list)
> 

I totally understand your consideration. I think it would be great if we could 
specify more detailed resource requirements on test metadata (thus not wasting 
resources on small tests).

> 
> Well, if we can't run the test on our infra, we could disable it, but what's 
> the point of having the autopkgtest then? (If you split the tests over 
> multiple stanza, you get the 2.47 hour per set. Does that help?)
> 

It might help. For upstream test on GitHub Actions, it is actually split into 
four parts [1], each taking ~1hr. I can (and plan to) integrate that into debt 
tests.

> Let me try to see if I can have debci create larger VM's for us and let me 
> try your package again. What are the resources you use yourself for the test 
> and how long does it take in that case?
> 

My testing resources are maybe not that representative (20 cores + 32GB 
memory), it takes about the same time (3hr40min) as upstream configuration (4 
cores + 7GB).
I will try with fewer resources recently and give you more information.

[1]: 
https://github.com/openzfs/zfs/blob/master/.github/workflows/scripts/setup-functional.sh

--
Thanks,
Shengqi Chen



Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-08 Thread Paul Gevers

Hi,

On 08-04-2024 3:51 a.m., 陈 晟祺 wrote:

With resources limited to one CPU (AMD EPYC 7551) and 2G memory,
my local test could now reproduce the test hang and following time out error.


Ouch.


I think it is caused by insufficient resources (e.g. OOM killer, but I am not 
sure).
Even we can work it around, the test process would be still be too slow to 
finish.

Is it possible to allocate more resources for the test? For reference, openzfs 
uses
GitHub-hosted workflow runners [1] for test. Each runner has 2 CPU cores and
7 GB memory, under which configuration the whole test still takes ~4hrs.


Our timeout is 1 seconds, so 2.47 hours, per autopkgtest stanza 
(overall it's 8 hours). If the test is going to take longer, it will 
fail anyways. So maybe it was just still running? I'm a bit hesitant, 
particularly about the memory to make much bigger VM's, because most 
tests don't need it and it limits the amount of VM's we can make. We 
need to strike a nice balance (or fix 
https://salsa.debian.org/ci-team/debci/-/issues/166#note_451831 and add 
zfs-linux to a "huge" list)



If not, is there any way to mark the test as optional (thus not causing RC bug)?
Otherwise our worst choice would be disable the test completely.


Well, if we can't run the test on our infra, we could disable it, but 
what's the point of having the autopkgtest then? (If you split the tests 
over multiple stanza, you get the 2.47 hour per set. Does that help?)


Let me try to see if I can have debci create larger VM's for us and let 
me try your package again. What are the resources you use yourself for 
the test and how long does it take in that case?


Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-07 Thread 陈 晟祺
Hi Paul,

> 2024年4月7日 21:10,Paul Gevers  写道:
> 
> Hi,
> 
> The host that runs this is an m3-large instance at equinix [1].
> 
> We create the qemu image with autopkgtest-build-qemu (default settings as far 
> as I know).
> 
> From within the testbed:
> root@host:~# lscpu
> lscpu
> Architecture:x86_64
>  CPU op-mode(s):32-bit, 64-bit
>  Address sizes: 48 bits physical, 48 bits virtual
>  Byte Order:Little Endian
> CPU(s):  1
>  On-line CPU(s) list:   0
> Vendor ID:   AuthenticAMD
>  BIOS Vendor ID:QEMU
>  Model name:AMD EPYC 7502P 32-Core Processor
>BIOS Model name: pc-i440fx-7.2  CPU @ 2.0GHz
>BIOS CPU family: 1
>CPU family:  23
>Model:   49
>Thread(s) per core:  1
>Core(s) per socket:  1
>Socket(s):   1
> 
> root@host:~# lsmem
> lsmem
> RANGE SIZE  STATE REMOVABLE BLOCK
> 0x-0x7fff   2G online   yes  0-15
> 
> Memory block size:   128M
> Total online memory:   2G
> Total offline memory:  0B
> 

With resources limited to one CPU (AMD EPYC 7551) and 2G memory,
my local test could now reproduce the test hang and following time out error.

I think it is caused by insufficient resources (e.g. OOM killer, but I am not 
sure).
Even we can work it around, the test process would be still be too slow to 
finish.

Is it possible to allocate more resources for the test? For reference, openzfs 
uses
GitHub-hosted workflow runners [1] for test. Each runner has 2 CPU cores and
7 GB memory, under which configuration the whole test still takes ~4hrs.

If not, is there any way to mark the test as optional (thus not causing RC bug)?
Otherwise our worst choice would be disable the test completely.

[1]: 
https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-private-repositories
[2]: 
https://github.com/openzfs/zfs/blob/master/.github/workflows/scripts/setup-functional.sh


Thanks,
Shengqi Chen



Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-07 Thread Paul Gevers

Hi,

On 07-04-2024 2:29 p.m., 陈 晟祺 wrote:

Could you please provide more detailed information on the test settings on 
ci.d.o.?
E.g., CPU type, #cores, memory size, etc.


The host that runs this is an m3-large instance at equinix [1].

We create the qemu image with autopkgtest-build-qemu (default settings 
as far as I know).


From within the testbed:
root@host:~# lscpu
lscpu
Architecture:x86_64
  CPU op-mode(s):32-bit, 64-bit
  Address sizes: 48 bits physical, 48 bits virtual
  Byte Order:Little Endian
CPU(s):  1
  On-line CPU(s) list:   0
Vendor ID:   AuthenticAMD
  BIOS Vendor ID:QEMU
  Model name:AMD EPYC 7502P 32-Core Processor
BIOS Model name: pc-i440fx-7.2  CPU @ 2.0GHz
BIOS CPU family: 1
CPU family:  23
Model:   49
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):   1
Stepping:0
BogoMIPS:4990.62
Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep 
mtrr pge mc
 a cmov pat pse36 clflush mmx fxsr sse sse2 
syscall nx m
 mxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl 
cpuid ex
 td_apicid tsc_known_freq pni pclmulqdq ssse3 
fma cx16 s
 se4_1 sse4_2 x2apic movbe popcnt 
tsc_deadline_timer aes
  xsave avx f16c rdrand hypervisor lahf_lm 
cmp_legacy sv
 m cr8_legacy abm sse4a misalignsse 
3dnowprefetch osvw p
 erfctr_core ssbd ibrs ibpb stibp vmmcall 
fsgsbase tsc_a
 djust bmi1 avx2 smep bmi2 rdseed adx smap 
clflushopt cl
 wb sha_ni xsaveopt xsavec xgetbv1 clzero 
xsaveerptr wbn
 oinvd arat npt lbrv nrip_save tsc_scale 
vmcb_clean paus
 efilter pfthreshold v_vmsave_vmload vgif umip 
rdpid arc

 h_capabilities
Virtualization features:
  Virtualization:AMD-V
  Hypervisor vendor: KVM
  Virtualization type:   full
Caches (sum of all):
  L1d:   64 KiB (1 instance)
  L1i:   64 KiB (1 instance)
  L2:512 KiB (1 instance)
  L3:16 MiB (1 instance)
NUMA:
  NUMA node(s):  1
  NUMA node0 CPU(s): 0
Vulnerabilities:
  Gather data sampling:  Not affected
  Itlb multihit: Not affected
  L1tf:  Not affected
  Mds:   Not affected
  Meltdown:  Not affected
  Mmio stale data:   Not affected
  Retbleed:  Mitigation; untrained return thunk; SMT disabled
  Spec rstack overflow:  Vulnerable: Safe RET, no microcode
  Spec store bypass: Mitigation; Speculative Store Bypass disabled 
via prctl
  Spectre v1:Mitigation; usercopy/swapgs barriers and 
__user pointer

  sanitization
  Spectre v2:Mitigation; Retpolines, IBPB conditional, 
STIBP disable

 d, RSB filling, PBRSB-eIBRS Not affected
  Srbds: Not affected
  Tsx async abort:   Not affected
root@host:~# lsmem
lsmem
RANGE SIZE  STATE REMOVABLE BLOCK
0x-0x7fff   2G online   yes  0-15

Memory block size:   128M
Total online memory:   2G
Total offline memory:  0B


Paul

[1] https://deploy.equinix.com/product/servers/m3-large/


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-07 Thread 陈 晟祺
Hi,

> 2024年4月7日 17:23,Paul Gevers  写道:
> 
> Dear maintainer(s),
> 
> Your package has an autopkgtest, great. I recently added support for 
> isolation-machine tests on ci.debian.net for amd64 and added your package to 
> the list to use that. However, it fails because the zfs-test-suite test times 
> out after 2:47h (it seems to hang by the looks of the log). Can you please 
> investigate the situation and fix it? I copied some of the output at the 
> bottom of this report.
> 

Thanks for your work! I have long waited for the isolation-machine tag to be 
available.

> The release team has announced [1] that failing autopkgtest on amd64 and 
> arm64 are considered RC in testing, but because machine-isolation support by 
> ci.debian.net is new I have not marked this bug as serious (yet).
> 
> Because the test doesn't fail, but tmpfails (might be a bug in autopkgtest), 
> I've reverted the preferred backend for zfs-linux back to lxc until this bug 
> is closed.
> 

I am not yet able to reproduce the hang on my local testing environment.
Could you please provide more detailed information on the test settings on 
ci.d.o.?
E.g., CPU type, #cores, memory size, etc.

Thanks,
Shengqi Chen



Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)

2024-04-07 Thread Paul Gevers

Source: zfs-linux
Version: 2.2.3-1
Severity: important
User: debian...@lists.debian.org
Usertags: isolation-machine timeout

Dear maintainer(s),

Your package has an autopkgtest, great. I recently added support for 
isolation-machine tests on ci.debian.net for amd64 and added your 
package to the list to use that. However, it fails because the 
zfs-test-suite test times out after 2:47h (it seems to hang by the looks 
of the log). Can you please investigate the situation and fix it? I 
copied some of the output at the bottom of this report.


The release team has announced [1] that failing autopkgtest on amd64 and 
arm64 are considered RC in testing, but because machine-isolation 
support by ci.debian.net is new I have not marked this bug as serious (yet).


Because the test doesn't fail, but tmpfails (might be a bug in 
autopkgtest), I've reverted the preferred backend for zfs-linux back to 
lxc until this bug is closed.


More information about this bug and the reason for filing it can be 
found on 
https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation


Paul

[1] https://lists.debian.org/debian-devel-announce/2019/07/msg2.html

https://ci.debian.net/packages/z/zfs-linux/testing/amd64/44891484/

4599s Test: 
/usr/share/zfs/zfs-tests/tests/functional/cli_root/zpool_initialize/zpool_initialize_verify_checksums 
(run as root) [00:53] [PASS]
4604s Test: 
/usr/share/zfs/zfs-tests/tests/functional/cli_root/zpool_initialize/zpool_initialize_verify_initialized 
(run as root) [00:04] [PASS]
4604s Test: 
/usr/share/zfs/zfs-tests/tests/functional/cli_root/zpool_initialize/cleanup 
(run as root) [00:00] [PASS]
4605s Test: 
/usr/share/zfs/zfs-tests/tests/functional/cli_root/zpool_labelclear/zpool_labelclear_active 
(run as root) [00:00] [PASS]
4606s Test: 
/usr/share/zfs/zfs-tests/tests/functional/cli_root/zpool_labelclear/zpool_labelclear_exported 
(run as root) [00:00] [PASS]
10970s autopkgtest [08:58:15]: ERROR: timed out on command "su -s 
/bin/bash root -c set -e; exec /tmp/autopkgtest.ho3dFf/wrapper.sh 
--artifacts=/tmp/autopkgtest.ho3dFf/zfs-test-suite-artifacts 
--chdir=/tmp/autopkgtest.ho3dFf/build.4kv/src 
--env=DEB_BUILD_OPTIONS=parallel=1 --env=DEBIAN_FRONTEND=noninteractive 
--env=LANG=C.UTF-8 --unset-env=LANGUAGE --unset-env=LC_ADDRESS 
--unset-env=LC_ALL --unset-env=LC_COLLATE --unset-env=LC_CTYPE 
--unset-env=LC_IDENTIFICATION --unset-env=LC_MEASUREMENT 
--unset-env=LC_MESSAGES --unset-env=LC_MONETARY --unset-env=LC_NAME 
--unset-env=LC_NUMERIC --unset-env=LC_PAPER --unset-env=LC_TELEPHONE 
--unset-env=LC_TIME --script-pid-file=/tmp/autopkgtest_script_pid 
--source-profile --stderr=/tmp/autopkgtest.ho3dFf/zfs-test-suite-stderr 
--stdout=/tmp/autopkgtest.ho3dFf/zfs-test-suite-stdout 
--tmp=/tmp/autopkgtest.ho3dFf/autopkgtest_tmp 
--env=AUTOPKGTEST_NORMAL_USER=debci --env=ADT_NORMAL_USER=debci 
--make-executable=/tmp/autopkgtest.ho3dFf/build.4kv/src/debian/tests/zfs-test-suite 
-- /tmp/autopkgtest.ho3dFf/build.4kv/src/debian/tests/zfs-test-suite" 
(kind: test)

10971s autopkgtest [08:58:16]: test zfs-test-suite: ---]


OpenPGP_signature.asc
Description: OpenPGP digital signature