Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, On 19-05-2024 2:55 p.m., 陈 晟祺 wrote: My concern now is that the results do not seem to be stable or reproducible. That's an reoccurring problem in lots of places, yes. Is there any convention in handling such situation? E.g., should I mark all zfs-test-suite-x as flaky and treat them as reference only? It depends ;) The disadvantage of marking the whole test stanza as flaky means that it won't block regressions at all. Depending on how the test (I mean per stanza in d/t/control) is set up, it makes more sense to mark individual tests as flaky then the whole suite/stanza. However, if there's not enough granularity, that doesn't really help. Then there's the infrastructure argument. If your test is not a cheap one, running a long test only to fail flaky is a rather high price for very little gain. Then it might make more sense to not run the test by default (add a unknown restriction for example) and only use the test for manual checking, where you can judge (or rerun) the test as you judge fit. In the end it's your decision. All I can say is that tests that are flaky enough (my level is roughly worse than 1/8) and not marked as such are considered RC buggy. Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, > 2024年5月19日 13:51,Paul Gevers 写道: > > I already noticed yesterday and had it run; it failed. (Currently) top one > here: https://ci.debian.net/packages/z/zfs-linux/testing/amd64/ > My concern now is that the results do not seem to be stable or reproducible. Seven tests have been run since yesterday, of which two failed on zfs-test-suite-foo. Most of these failures may be false positive, and a re-run typically makes them pass. Almost every single test in the failed list can pass independently, but the whole test sequence is failure-prone. I have also observed similar behavior when testing locally (and also in upstream CI). Is there any convention in handling such situation? E.g., should I mark all zfs-test-suite-x as flaky and treat them as reference only? Thanks, Shengqi Chen
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi On 19-05-2024 6:25 a.m., 陈 晟祺 wrote: I have made more adjustments, basically skipping some flaky tests in VM. Now new version 2.2.4-1 is in the archive, please try that again when available. I already noticed yesterday and had it run; it failed. (Currently) top one here: https://ci.debian.net/packages/z/zfs-linux/testing/amd64/ Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, > > The test ran. Unfortunately zfs-test-suite-1 failed. > I have made more adjustments, basically skipping some flaky tests in VM. Now new version 2.2.4-1 is in the archive, please try that again when available. Thanks, Shengqi Chen
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, On 18-04-2024 10:25 p.m., Paul Gevers wrote: I'll hopefully do the changes tomorrow. (RL work is a bit busy at the moment.) The test ran. Unfortunately zfs-test-suite-1 failed. https://ci.debian.net/packages/z/zfs-linux/unstable/amd64/45683824/ 4089s Results Summary 4089s PASS 681 4089s FAIL 2 4089s SKIP 3 Seems like we're nearly there. (I made a tiny mistake in that run, as I had 8GB RAM; I have now lowered it to 4GB which will be the setting until further discussion is warranted). Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, On 14-04-2024 5:14 a.m., 陈 晟祺 wrote: I would have aron to review & upload a new version, then we can test on debci infra and see whether it solves the problem. I forgot I promised changes to the settings. Without those changes, it doesn't end nicely: https://ci.debian.net/packages/z/zfs-linux/unstable/amd64/45540021/ I'll hopefully do the changes tomorrow. (RL work is a bit busy at the moment.) Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, On 14-04-2024 5:14 a.m., 陈 晟祺 wrote: When using non-ramdisk tmpdir (/var/tmp) and some large tests skipped [1], the tests would run with 2 core + 4GB memory + ~10GB disk space. I also tried 2GB / 3GB, and both will be interrupted by OOM killer. So, let's settle on 2+4 for now. That sounds like a value we could very reasonably support. I'll configure our setup for that. I would have aron to review & upload a new version, then we can test on debci infra and see whether it solves the problem. Ack. Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Control: tags -1 + pending Hi, > 2024年4月13日 01:29,陈 晟祺 写道: > > I am now trying to run tests on 2 core and 4GB memory (and maybe less later). > If the tester itself does not occupy too much RAM, the real requirement for > resources > is now probably several gigabytes of disk space (currently it’s ~10GB). > > I will give more feedback once new results come out. > When using non-ramdisk tmpdir (/var/tmp) and some large tests skipped [1], the tests would run with 2 core + 4GB memory + ~10GB disk space. I also tried 2GB / 3GB, and both will be interrupted by OOM killer. I would have aron to review & upload a new version, then we can test on debci infra and see whether it solves the problem. [1]: https://salsa.debian.org/zfsonlinux-team/zfs/-/commit/cf8e8afe69a0a8f21768415a08b131f8aa9fdc6a Thanks, -- Shengqi Chen
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, > 2024年4月12日 12:48,Paul Gevers 写道: > > Hi, > > On 12-04-2024 4:42 a.m., 陈 晟祺 wrote: >> - If I limit the test file size to 1G, quite many tests would fail even with >> adequate resources > > Ack. To be fair, I was more thinking to make current test conditional on the > available free disk space. But yeah, that might also lead to issues as the > test might be randomly skipped. > You got the point. I previously thought that testifies are on disk, but actually they are in tmpfs and consuming huge memory. That’s why OOM killer would kick in when writing large files in tests. > Good, so 2GB memory is not enough for zfs-linux (I assume you ran this test > with 2 cores like I did) Yes, I always use 2 cores. > > I agree we shouldn't spend too much time on squeezing it into the *current* > defaults. I'm still somewhat hoping that we could squeeze out a somewhat > smaller memory defaults than 8 GB: does 4 GB work (and if so, how long does > it take)? > I am now trying to run tests on 2 core and 4GB memory (and maybe less later). If the tester itself does not occupy too much RAM, the real requirement for resources is now probably several gigabytes of disk space (currently it’s ~10GB). I will give more feedback once new results come out. Thanks, -- Shengqi Chen
Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, On 12-04-2024 4:42 a.m., 陈 晟祺 wrote: - If I limit the test file size to 1G, quite many tests would fail even with adequate resources Ack. To be fair, I was more thinking to make current test conditional on the available free disk space. But yeah, that might also lead to issues as the test might be randomly skipped. - If I try to skip large_files as you indicated with 2G memory, Good, so 2GB memory is not enough for zfs-linux (I assume you ran this test with 2 cores like I did) - With my fixes to dependencies, the tests could run to the ending without errors on 2 core + 8 GB. Great. That's progress than. Therefore I think trying to fit zfs-tests into a normal debci VM might be troublesome. I agree we shouldn't spend too much time on squeezing it into the *current* defaults. I'm still somewhat hoping that we could squeeze out a somewhat smaller memory defaults than 8 GB: does 4 GB work (and if so, how long does it take)? Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, > 2024年4月12日 02:39,Paul Gevers 写道: > > Hi > > On 11-04-2024 5:18 p.m., 陈 晟祺 wrote: >> If possible, could you help to build with latest code on salsa then run >> autopkgtest again on a normal debci VM? > > As I'm doing this live on the infrastructure, I don't want to do anything > there except testing what's in the archive, sorry. > Sure, this is reasonable. > My private setup (laptop) is not powerful enough to run this. > > I'm not 100% percent sure how to instruct you to build a ci.d.n like image. I > think it's: > $ autopkgtest-build-qemu debian testing > $ /usr/bin/autopkgtest --no-built-binaries --test-name=zfs-test-suite --user > debci zfs-linux -- qemu > except I don't know where autopkgtest-build-qemu stores the image. > I am indeed using debci images to ensure reproducibility. So the software environment should be the same. Just more observations here: - If I limit the test file size to 1G, quite many tests would fail even with adequate resources. - If I try to skip large_files as you indicated with 2G memory, the tests could proceed for a bit longer, but still got hang on some later tests. Since there are so many tests and I am not familiar with most of them, I have to try it repeatedly to find out which to filter out. Even I could do so, some (other, not seen before) tests would fail unexpectedly. These problems might be hard to workaround. - With my fixes to dependencies, the tests could run to the ending without errors on 2 core + 8 GB. Therefore I think trying to fit zfs-tests into a normal debci VM might be troublesome. -- Thanks, Shengqi Chen
Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi On 11-04-2024 5:18 p.m., 陈 晟祺 wrote: If possible, could you help to build with latest code on salsa then run autopkgtest again on a normal debci VM? As I'm doing this live on the infrastructure, I don't want to do anything there except testing what's in the archive, sorry. My private setup (laptop) is not powerful enough to run this. I'm not 100% percent sure how to instruct you to build a ci.d.n like image. I think it's: $ autopkgtest-build-qemu debian testing $ /usr/bin/autopkgtest --no-built-binaries --test-name=zfs-test-suite --user debci zfs-linux -- qemu except I don't know where autopkgtest-build-qemu stores the image. Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi Paul, 2024年4月11日 20:59,Paul Gevers 写道: Hi, With the default size of the ramdisk and 2 cpu's the test crashes with: Test: /usr/share/zfs/zfs-tests/tests/functional/large_files/setup (run as root) [00:00] [PASS] Test: /usr/share/zfs/zfs-tests/tests/functional/large_files/large_files_001_pos (run as root) [00:00] [PASS] Test: /usr/share/zfs/zfs-tests/tests/functional/large_files/large_files_002_pos (run as root) [00:00] [PASS] Test: /usr/share/zfs/zfs-tests/tests/functional/large_files/cleanup (run as root) [00:00] [PASS] Killed Killed Killed qemu-system-x86_64: terminating on signal 15 from pid 132251 (/usr/bin/python3) autopkgtest [12:28:46]: ERROR: testbed failure: timed out on command "cat /run/autopkgtest-reboot-mark" (kind: short) root@ci-worker13:~# That at least hints that those tests *might* be generating a bit too large files to be handled in this case. Maybe worth making these tests conditional on free space if they aren't already. Thanks for your detailed diagnosis. I adjusted a test option to limit the maximum file size [1]. Also I fixed numerous test errors caused by missing dependencies [2]. Yet I am concerned that some tests might fail, in turn, due to insufficient disk space. If so I will have to ignore some tests on either side. If possible, could you help to build with latest code on salsa then run autopkgtest again on a normal debci VM? I am also testing that locally. [1]: https://salsa.debian.org/zfsonlinux-team/zfs/-/commit/f6bea9224c4bf734ac381bac36a995dfd33b2078 [2]: https://salsa.debian.org/zfsonlinux-team/zfs/-/commit/177d5b2eab39cf8ca0e7bb66d462b4886f2372e4 Thanks, Shengqi Chen
Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, With the default size of the ramdisk and 2 cpu's the test crashes with: Test: /usr/share/zfs/zfs-tests/tests/functional/large_files/setup (run as root) [00:00] [PASS] Test: /usr/share/zfs/zfs-tests/tests/functional/large_files/large_files_001_pos (run as root) [00:00] [PASS] Test: /usr/share/zfs/zfs-tests/tests/functional/large_files/large_files_002_pos (run as root) [00:00] [PASS] Test: /usr/share/zfs/zfs-tests/tests/functional/large_files/cleanup (run as root) [00:00] [PASS] Killed Killed Killed qemu-system-x86_64: terminating on signal 15 from pid 132251 (/usr/bin/python3) autopkgtest [12:28:46]: ERROR: testbed failure: timed out on command "cat /run/autopkgtest-reboot-mark" (kind: short) root@ci-worker13:~# That at least hints that those tests *might* be generating a bit too large files to be handled in this case. Maybe worth making these tests conditional on free space if they aren't already. Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, Some additional info from my side. I have just run the following: root@ci-worker13:~# /usr/bin/autopkgtest --no-built-binaries --test-name=zfs-test-suite --timeout-factor=3 --user debci zfs-linux -- qemu --cpus=2 --ram-size=8192 /var/lib/debci/qemu/testing-amd64.img The test failed and took 3 hours and 13 minutes: The test started at 06:32:45 The test ends like: SKIP cli_root/zfs_unshare/zfs_unshare_005_neg (expected PASS) SKIP cli_root/zfs_unshare/zfs_unshare_007_pos (expected PASS) SKIP cli_root/zfs_unshare/zfs_unshare_008_pos (expected PASS) FAIL cli_root/zpool_destroy/zpool_destroy_002_pos (expected PASS) FAIL cli_root/zpool_detach/setup (expected PASS) SKIP cli_root/zpool_detach/zpool_detach_001_neg (expected PASS) FAIL cli_root/zpool_import/zpool_import_012_pos (expected PASS) FAIL cli_root/zpool_import/zpool_import_rename_001_pos (expected PASS) FAIL history/history_007_pos (expected PASS) FAIL inheritance/inherit_001_pos (expected PASS) FAIL slog/slog_replay_fs_001 (expected PASS) FAIL slog/slog_replay_fs_002 (expected PASS) autopkgtest [09:45:26]: test zfs-test-suite: ---] autopkgtest [09:45:27]: test zfs-test-suite: - - - - - - - - - - results - - - - - - - - - - zfs-test-suite FAIL non-zero exit status 1 Now I wonder if it's the number of cpu's or the memory that caused it to finish starting up a new test. Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, Some additional information and errata here. I have split the tests into four stanzas as upstream does [1]. The resources of one GitHub Action runner is actually 4 cores + 16GB memory, not 2 cores + 8GB as I mentioned before. The test could finish within reasonable time (3hrs) with such configuration (although with a few unexpected failures, but I think it could be solved). I am still trying with fewer resources, especially shrinking the memory. [1]: https://salsa.debian.org/zfsonlinux-team/zfs/-/commit/7afeda495fa5b8129dfac45aef6340f46fbaf3a6 -- Thanks, Shengqi Chen.
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: Bug#1068559: Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, > 2024年4月9日 02:51,Paul Gevers 写道: > > Our timeout is 1 seconds, so 2.47 hours, per autopkgtest stanza (overall > it's 8 hours). If the test is going to take longer, it will fail anyways. So > maybe it was just still running? I'm a bit hesitant, particularly about the > memory to make much bigger VM's, because most tests don't need it and it > limits the amount of VM's we can make. We need to strike a nice balance (or > fix https://salsa.debian.org/ci-team/debci/-/issues/166#note_451831 and add > zfs-linux to a "huge" list) > I totally understand your consideration. I think it would be great if we could specify more detailed resource requirements on test metadata (thus not wasting resources on small tests). > > Well, if we can't run the test on our infra, we could disable it, but what's > the point of having the autopkgtest then? (If you split the tests over > multiple stanza, you get the 2.47 hour per set. Does that help?) > It might help. For upstream test on GitHub Actions, it is actually split into four parts [1], each taking ~1hr. I can (and plan to) integrate that into debt tests. > Let me try to see if I can have debci create larger VM's for us and let me > try your package again. What are the resources you use yourself for the test > and how long does it take in that case? > My testing resources are maybe not that representative (20 cores + 32GB memory), it takes about the same time (3hr40min) as upstream configuration (4 cores + 7GB). I will try with fewer resources recently and give you more information. [1]: https://github.com/openzfs/zfs/blob/master/.github/workflows/scripts/setup-functional.sh -- Thanks, Shengqi Chen
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, On 08-04-2024 3:51 a.m., 陈 晟祺 wrote: With resources limited to one CPU (AMD EPYC 7551) and 2G memory, my local test could now reproduce the test hang and following time out error. Ouch. I think it is caused by insufficient resources (e.g. OOM killer, but I am not sure). Even we can work it around, the test process would be still be too slow to finish. Is it possible to allocate more resources for the test? For reference, openzfs uses GitHub-hosted workflow runners [1] for test. Each runner has 2 CPU cores and 7 GB memory, under which configuration the whole test still takes ~4hrs. Our timeout is 1 seconds, so 2.47 hours, per autopkgtest stanza (overall it's 8 hours). If the test is going to take longer, it will fail anyways. So maybe it was just still running? I'm a bit hesitant, particularly about the memory to make much bigger VM's, because most tests don't need it and it limits the amount of VM's we can make. We need to strike a nice balance (or fix https://salsa.debian.org/ci-team/debci/-/issues/166#note_451831 and add zfs-linux to a "huge" list) If not, is there any way to mark the test as optional (thus not causing RC bug)? Otherwise our worst choice would be disable the test completely. Well, if we can't run the test on our infra, we could disable it, but what's the point of having the autopkgtest then? (If you split the tests over multiple stanza, you get the 2.47 hour per set. Does that help?) Let me try to see if I can have debci create larger VM's for us and let me try your package again. What are the resources you use yourself for the test and how long does it take in that case? Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi Paul, > 2024年4月7日 21:10,Paul Gevers 写道: > > Hi, > > The host that runs this is an m3-large instance at equinix [1]. > > We create the qemu image with autopkgtest-build-qemu (default settings as far > as I know). > > From within the testbed: > root@host:~# lscpu > lscpu > Architecture:x86_64 > CPU op-mode(s):32-bit, 64-bit > Address sizes: 48 bits physical, 48 bits virtual > Byte Order:Little Endian > CPU(s): 1 > On-line CPU(s) list: 0 > Vendor ID: AuthenticAMD > BIOS Vendor ID:QEMU > Model name:AMD EPYC 7502P 32-Core Processor >BIOS Model name: pc-i440fx-7.2 CPU @ 2.0GHz >BIOS CPU family: 1 >CPU family: 23 >Model: 49 >Thread(s) per core: 1 >Core(s) per socket: 1 >Socket(s): 1 > > root@host:~# lsmem > lsmem > RANGE SIZE STATE REMOVABLE BLOCK > 0x-0x7fff 2G online yes 0-15 > > Memory block size: 128M > Total online memory: 2G > Total offline memory: 0B > With resources limited to one CPU (AMD EPYC 7551) and 2G memory, my local test could now reproduce the test hang and following time out error. I think it is caused by insufficient resources (e.g. OOM killer, but I am not sure). Even we can work it around, the test process would be still be too slow to finish. Is it possible to allocate more resources for the test? For reference, openzfs uses GitHub-hosted workflow runners [1] for test. Each runner has 2 CPU cores and 7 GB memory, under which configuration the whole test still takes ~4hrs. If not, is there any way to mark the test as optional (thus not causing RC bug)? Otherwise our worst choice would be disable the test completely. [1]: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-private-repositories [2]: https://github.com/openzfs/zfs/blob/master/.github/workflows/scripts/setup-functional.sh Thanks, Shengqi Chen
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, On 07-04-2024 2:29 p.m., 陈 晟祺 wrote: Could you please provide more detailed information on the test settings on ci.d.o.? E.g., CPU type, #cores, memory size, etc. The host that runs this is an m3-large instance at equinix [1]. We create the qemu image with autopkgtest-build-qemu (default settings as far as I know). From within the testbed: root@host:~# lscpu lscpu Architecture:x86_64 CPU op-mode(s):32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order:Little Endian CPU(s): 1 On-line CPU(s) list: 0 Vendor ID: AuthenticAMD BIOS Vendor ID:QEMU Model name:AMD EPYC 7502P 32-Core Processor BIOS Model name: pc-i440fx-7.2 CPU @ 2.0GHz BIOS CPU family: 1 CPU family: 23 Model: 49 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 1 Stepping:0 BogoMIPS:4990.62 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc a cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx m mxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid ex td_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 s se4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy sv m cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw p erfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_a djust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt cl wb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr wbn oinvd arat npt lbrv nrip_save tsc_scale vmcb_clean paus efilter pfthreshold v_vmsave_vmload vgif umip rdpid arc h_capabilities Virtualization features: Virtualization:AMD-V Hypervisor vendor: KVM Virtualization type: full Caches (sum of all): L1d: 64 KiB (1 instance) L1i: 64 KiB (1 instance) L2:512 KiB (1 instance) L3:16 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Retbleed: Mitigation; untrained return thunk; SMT disabled Spec rstack overflow: Vulnerable: Safe RET, no microcode Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1:Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2:Mitigation; Retpolines, IBPB conditional, STIBP disable d, RSB filling, PBRSB-eIBRS Not affected Srbds: Not affected Tsx async abort: Not affected root@host:~# lsmem lsmem RANGE SIZE STATE REMOVABLE BLOCK 0x-0x7fff 2G online yes 0-15 Memory block size: 128M Total online memory: 2G Total offline memory: 0B Paul [1] https://deploy.equinix.com/product/servers/m3-large/ OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1068559: [Pkg-zfsonlinux-devel] Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Hi, > 2024年4月7日 17:23,Paul Gevers 写道: > > Dear maintainer(s), > > Your package has an autopkgtest, great. I recently added support for > isolation-machine tests on ci.debian.net for amd64 and added your package to > the list to use that. However, it fails because the zfs-test-suite test times > out after 2:47h (it seems to hang by the looks of the log). Can you please > investigate the situation and fix it? I copied some of the output at the > bottom of this report. > Thanks for your work! I have long waited for the isolation-machine tag to be available. > The release team has announced [1] that failing autopkgtest on amd64 and > arm64 are considered RC in testing, but because machine-isolation support by > ci.debian.net is new I have not marked this bug as serious (yet). > > Because the test doesn't fail, but tmpfails (might be a bug in autopkgtest), > I've reverted the preferred backend for zfs-linux back to lxc until this bug > is closed. > I am not yet able to reproduce the hang on my local testing environment. Could you please provide more detailed information on the test settings on ci.d.o.? E.g., CPU type, #cores, memory size, etc. Thanks, Shengqi Chen
Bug#1068559: zfs-linux: isolation-machine autopkgtest fails: zfs-test-suite times out (hangs?)
Source: zfs-linux Version: 2.2.3-1 Severity: important User: debian...@lists.debian.org Usertags: isolation-machine timeout Dear maintainer(s), Your package has an autopkgtest, great. I recently added support for isolation-machine tests on ci.debian.net for amd64 and added your package to the list to use that. However, it fails because the zfs-test-suite test times out after 2:47h (it seems to hang by the looks of the log). Can you please investigate the situation and fix it? I copied some of the output at the bottom of this report. The release team has announced [1] that failing autopkgtest on amd64 and arm64 are considered RC in testing, but because machine-isolation support by ci.debian.net is new I have not marked this bug as serious (yet). Because the test doesn't fail, but tmpfails (might be a bug in autopkgtest), I've reverted the preferred backend for zfs-linux back to lxc until this bug is closed. More information about this bug and the reason for filing it can be found on https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation Paul [1] https://lists.debian.org/debian-devel-announce/2019/07/msg2.html https://ci.debian.net/packages/z/zfs-linux/testing/amd64/44891484/ 4599s Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zpool_initialize/zpool_initialize_verify_checksums (run as root) [00:53] [PASS] 4604s Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zpool_initialize/zpool_initialize_verify_initialized (run as root) [00:04] [PASS] 4604s Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zpool_initialize/cleanup (run as root) [00:00] [PASS] 4605s Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zpool_labelclear/zpool_labelclear_active (run as root) [00:00] [PASS] 4606s Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zpool_labelclear/zpool_labelclear_exported (run as root) [00:00] [PASS] 10970s autopkgtest [08:58:15]: ERROR: timed out on command "su -s /bin/bash root -c set -e; exec /tmp/autopkgtest.ho3dFf/wrapper.sh --artifacts=/tmp/autopkgtest.ho3dFf/zfs-test-suite-artifacts --chdir=/tmp/autopkgtest.ho3dFf/build.4kv/src --env=DEB_BUILD_OPTIONS=parallel=1 --env=DEBIAN_FRONTEND=noninteractive --env=LANG=C.UTF-8 --unset-env=LANGUAGE --unset-env=LC_ADDRESS --unset-env=LC_ALL --unset-env=LC_COLLATE --unset-env=LC_CTYPE --unset-env=LC_IDENTIFICATION --unset-env=LC_MEASUREMENT --unset-env=LC_MESSAGES --unset-env=LC_MONETARY --unset-env=LC_NAME --unset-env=LC_NUMERIC --unset-env=LC_PAPER --unset-env=LC_TELEPHONE --unset-env=LC_TIME --script-pid-file=/tmp/autopkgtest_script_pid --source-profile --stderr=/tmp/autopkgtest.ho3dFf/zfs-test-suite-stderr --stdout=/tmp/autopkgtest.ho3dFf/zfs-test-suite-stdout --tmp=/tmp/autopkgtest.ho3dFf/autopkgtest_tmp --env=AUTOPKGTEST_NORMAL_USER=debci --env=ADT_NORMAL_USER=debci --make-executable=/tmp/autopkgtest.ho3dFf/build.4kv/src/debian/tests/zfs-test-suite -- /tmp/autopkgtest.ho3dFf/build.4kv/src/debian/tests/zfs-test-suite" (kind: test) 10971s autopkgtest [08:58:16]: test zfs-test-suite: ---] OpenPGP_signature.asc Description: OpenPGP digital signature