Bug#1065395: [Pkg-opencl-devel] Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space
On 14/03/2024 10.54, Paul Gevers wrote: I just did. The biggest rise I saw (and I didn't even stop parallel runners) was ~5 GB, so this version seems fine. Please let me know when the other versions are fixed too. Thanks. I'll upgrade the other ones over the next days, closing the existing bug again with each upload. Don't hesitate to yell if you notice something suspiciously straining on the CI ;-) Andreas
Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space
Hi, On 12-03-2024 10:18 a.m., Andreas Beckmann wrote: On 06/03/2024 06.20, Paul Gevers wrote: Unfortunately the test still takes upto 33 GB at least (see below). Did you have time to test the -12 version, yet? I just did. The biggest rise I saw (and I didn't even stop parallel runners) was ~5 GB, so this version seems fine. Please let me know when the other versions are fixed too. Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space
Hi Paul, On 06/03/2024 06.20, Paul Gevers wrote: Unfortunately the test still takes upto 33 GB at least (see below). Did you have time to test the -12 version, yet? Andreas
Bug#1065395: [Pkg-opencl-devel] Bug#1065395: Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space
On 06/03/2024 11.01, Paul Gevers wrote: Hi, On 06-03-2024 10:30 a.m., Andreas Beckmann wrote: Do you have the log from running that autopkgtest? I have no idea what's happening here. At least the buildd build only used 500 MB. Attached. Thanks. Actually, we were running the testsuite twice. Once by the build (that one was new), and once as regular autopkgtest (that has been done for some time already). So the issue should not have been really new, we just doubled the extreme disk space usage with the -10 upload ;-) I'm disabling that autopkgtest on s390x now. Andreas PS: updating -15+ might be a bit delayed since I found a regression in the upstream branch yesterday ..
Bug#1065395: [Pkg-opencl-devel] Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space
Hi, On 06-03-2024 10:30 a.m., Andreas Beckmann wrote: Do you have the log from running that autopkgtest? I have no idea what's happening here. At least the buildd build only used 500 MB. Attached. Paul debug.log.xz Description: application/xz OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1065395: [Pkg-opencl-devel] Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space
On 06/03/2024 06.20, Paul Gevers wrote: Unfortunately the test still takes upto 33 GB at least (see below). Do you have the log from running that autopkgtest? I have no idea what's happening here. At least the buildd build only used 500 MB. By the way, I just noticed this in the -14 log (judging from the name of the test I think that's intentional, but just checking (installing from the -16 package instead of the -14 one): Get:2 http://deb.debian.org/debian unstable/main s390x spirv-headers all 1.6.1+1.3.275.0-1 [118 kB] spirv-headers is an independent package, the version is not corresponding to an llvm version (1.6.1 has nothing to do with llvm 16). But since it evolves independently, it sometimes renames bits (e.g. if internal vendor extensions get finalized and upstreamed), breaking its consumers (but the lvm_release_* branches of llvm-spirv-translator are usually quickly adjusted) Andreas
Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space
Hi Andreas, On 05-03-2024 10:16 a.m., Andreas Beckmann wrote: But first I'd like to see the s390x build happen and your confirmation that this unbreaks the CI infrastructure. But at least ppc64 and sparc64 built with 500MB instead of 40GB now ;-) Feel free to block 15-17 temporarily, too. Unfortunately the test still takes upto 33 GB at least (see below). Paul By the way, I just noticed this in the -14 log (judging from the name of the test I think that's intentional, but just checking (installing from the -16 package instead of the -14 one): Get:2 http://deb.debian.org/debian unstable/main s390x spirv-headers all 1.6.1+1.3.275.0-1 [118 kB] root@ci-worker-s390x-01:~# while true ; do df -h /scratch/ | grep mapper ; sleep 10 ; done /dev/mapper/3600507630affd250004a 196G 27G 160G 15% /scratch /dev/mapper/3600507630affd250004a 196G 27G 160G 15% /scratch /dev/mapper/3600507630affd250004a 196G 27G 160G 15% /scratch /dev/mapper/3600507630affd250004a 196G 27G 160G 15% /scratch /dev/mapper/3600507630affd250004a 196G 27G 160G 15% /scratch /dev/mapper/3600507630affd250004a 196G 27G 160G 15% /scratch /dev/mapper/3600507630affd250004a 196G 27G 160G 15% /scratch /dev/mapper/3600507630affd250004a 196G 27G 159G 15% /scratch /dev/mapper/3600507630affd250004a 196G 28G 159G 15% /scratch /dev/mapper/3600507630affd250004a 196G 28G 159G 15% /scratch /dev/mapper/3600507630affd250004a 196G 28G 159G 15% /scratch /dev/mapper/3600507630affd250004a 196G 28G 159G 15% /scratch /dev/mapper/3600507630affd250004a 196G 29G 158G 16% /scratch /dev/mapper/3600507630affd250004a 196G 29G 158G 16% /scratch /dev/mapper/3600507630affd250004a 196G 29G 158G 16% /scratch /dev/mapper/3600507630affd250004a 196G 29G 158G 16% /scratch /dev/mapper/3600507630affd250004a 196G 29G 157G 16% /scratch /dev/mapper/3600507630affd250004a 196G 29G 157G 16% /scratch /dev/mapper/3600507630affd250004a 196G 30G 157G 16% /scratch /dev/mapper/3600507630affd250004a 196G 30G 157G 16% /scratch /dev/mapper/3600507630affd250004a 196G 28G 159G 15% /scratch /dev/mapper/3600507630affd250004a 196G 28G 159G 15% /scratch /dev/mapper/3600507630affd250004a 196G 28G 159G 15% /scratch /dev/mapper/3600507630affd250004a 196G 28G 159G 15% /scratch /dev/mapper/3600507630affd250004a 196G 29G 158G 16% /scratch /dev/mapper/3600507630affd250004a 196G 34G 153G 19% /scratch /dev/mapper/3600507630affd250004a 196G 44G 143G 24% /scratch /dev/mapper/3600507630affd250004a 196G 53G 133G 29% /scratch /dev/mapper/3600507630affd250004a 196G 62G 124G 34% /scratch /dev/mapper/3600507630affd250004a 196G 70G 117G 38% /scratch /dev/mapper/3600507630affd250004a 196G 28G 159G 15% /scratch /dev/mapper/3600507630affd250004a 196G 28G 159G 15% /scratch /dev/mapper/3600507630affd250004a 196G 28G 159G 15% /scratch /dev/mapper/3600507630affd250004a 196G 29G 158G 16% /scratch /dev/mapper/3600507630affd250004a 196G 27G 160G 15% /scratch /dev/mapper/3600507630affd250004a 196G 27G 160G 15% /scratch /dev/mapper/3600507630affd250004a 196G 27G 160G 15% /scratch /dev/mapper/3600507630affd250004a 196G 27G 160G 15% /scratch /dev/mapper/3600507630affd250004a 196G 27G 160G 15% /scratch OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space
Hi On 05-03-2024 10:16 a.m., Andreas Beckmann wrote: Feel free to block 15-17 temporarily, too. I already did that ;). I'll try when I see 14 in the archive on s390x. Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space
On 05/03/2024 07.45, Paul Gevers wrote: In the upstream report you mention it's the same across all versions and yesterday we had the same problem with -15. Will you fix the other versions too? Do you want me to clone this bug for that? I'll update all 4 branches over the next days, no need for extra bugs ;-) But first I'd like to see the s390x build happen and your confirmation that this unbreaks the CI infrastructure. But at least ppc64 and sparc64 built with 500MB instead of 40GB now ;-) Feel free to block 15-17 temporarily, too. Andreas
Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space
Hi Andreas, Thanks for the upload. On 04-03-2024 12:26 p.m., Andreas Beckmann wrote: Control: forwarded -1 https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/2397 On 03/03/2024 20.52, Paul Gevers wrote: Source: spirv-llvm-translator-14 Version: 14.0.0-10 In the upstream report you mention it's the same across all versions and yesterday we had the same problem with -15. Will you fix the other versions too? Do you want me to clone this bug for that? Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space
Control: forwarded -1 https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/2397 On 03/03/2024 20.52, Paul Gevers wrote: Source: spirv-llvm-translator-14 Version: 14.0.0-10 Since a couple of days, our workers on s390x are dying because some test is filling up all disk space. Several days ago, I wrongly suspected One of the suspects started to be spirv-llvm-translator-14, so I ran its autopkgtest manually, while logging disk use every 10 seconds (I started slightly delayed because I monitored the wrong partition first). As you can see below, during the test it grows from 17 GB (at the end) to its peak at 179 GB. That's not acceptable on our infrastructure. One file I happened to spot on the way was build/test/test_output/DebugInfo/Generic/Output/two-cus-from-same-file.ll.tmp: -rw-r--r-- 1 root root 41G Mar 3 19:18 two-cus-from-same-file.ll.tmp I have added spirv-llvm-translator-14 to our reject-list on s390x. As this seems to be a rather new issue, I'm wondering if it's due to: * Add build-needed autopkgtest for spirv-headers compat check. Probably. The buildds report disk usage when building spirv-llvm-translator-* between 400MB and 600MB on all architectures except s390x, ppc64, sparc64, i.e. all the big-endian ones, where it's slightly above 40GB (which very vell corresponds to the file you spotted). This started with 14.0.0-2 (i.e. 14.0.0-1 was around 500MB on s390x, too) which had "* Enable build-time tests, ignore failures on !amd64." So maybe I should skip the build-time tests on big-endian altogether. Failure rates: amd64: 0% i386: <1% ppc64el: <2% most: <10% s390x: >60% ppc64: >60% (Upstream seems to test the testsuite only on amd64, https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/1964) Andreas
Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space
Source: spirv-llvm-translator-14 Version: 14.0.0-10 Severity: serious X-Debbugs-CC: debian...@lists.debian.org User: debian...@lists.debian.org Usertags: issue Dear maintainers, Since a couple of days, our workers on s390x are dying because some test is filling up all disk space. Several days ago, I wrongly suspected src:fenics-dolfinx (bug #1064995) and added it to our reject-list. It didn't solve the issue, so today I spend more time on finding the culprit. Basically every spike above 40% in the graph [1] is a moment that we see issues like: Feb 28 05:38:18 ci-worker-s390x-01 debci[1738391]: gzip: /tmp/debci-worker-43383540-cNnbLE372K/autopkgtest-incoming/testing/s390x/f/fenics-dolfinx/43383540/log.gz: No space left on device Feb 28 05:38:18 ci-worker-s390x-01 debci[1424101]: E: Test for package fenics-dolfinx produced no exit code, aborting One of the suspects started to be spirv-llvm-translator-14, so I ran its autopkgtest manually, while logging disk use every 10 seconds (I started slightly delayed because I monitored the wrong partition first). As you can see below, during the test it grows from 17 GB (at the end) to its peak at 179 GB. That's not acceptable on our infrastructure. One file I happened to spot on the way was build/test/test_output/DebugInfo/Generic/Output/two-cus-from-same-file.ll.tmp: -rw-r--r-- 1 root root 41G Mar 3 19:18 two-cus-from-same-file.ll.tmp I have added spirv-llvm-translator-14 to our reject-list on s390x. As this seems to be a rather new issue, I'm wondering if it's due to: * Add build-needed autopkgtest for spirv-headers compat check. Or maybe something in the toolchain that broke on s390x? Paul [1] https://ci.debian.net/munin/ci-worker-s390x-01/ci-worker-s390x-01/df.html /dev/mapper/3600507630affd250004a 196G 40G 146G 22% /scratch /dev/mapper/3600507630affd250004a 196G 49G 138G 27% /scratch /dev/mapper/3600507630affd250004a 196G 57G 130G 31% /scratch /dev/mapper/3600507630affd250004a 196G 65G 122G 35% /scratch /dev/mapper/3600507630affd250004a 196G 66G 121G 36% /scratch /dev/mapper/3600507630affd250004a 196G 67G 120G 36% /scratch /dev/mapper/3600507630affd250004a 196G 70G 117G 38% /scratch /dev/mapper/3600507630affd250004a 196G 73G 114G 40% /scratch /dev/mapper/3600507630affd250004a 196G 76G 111G 41% /scratch /dev/mapper/3600507630affd250004a 196G 79G 108G 43% /scratch /dev/mapper/3600507630affd250004a 196G 83G 104G 45% /scratch /dev/mapper/3600507630affd250004a 196G 85G 101G 46% /scratch /dev/mapper/3600507630affd250004a 196G 88G 98G 48% /scratch /dev/mapper/3600507630affd250004a 196G 92G 95G 50% /scratch /dev/mapper/3600507630affd250004a 196G 95G 92G 51% /scratch /dev/mapper/3600507630affd250004a 196G 98G 89G 53% /scratch /dev/mapper/3600507630affd250004a 196G 101G 86G 54% /scratch /dev/mapper/3600507630affd250004a 196G 104G 83G 56% /scratch /dev/mapper/3600507630affd250004a 196G 107G 80G 58% /scratch /dev/mapper/3600507630affd250004a 196G 65G 122G 35% /scratch /dev/mapper/3600507630affd250004a 196G 65G 122G 35% /scratch /dev/mapper/3600507630affd250004a 196G 66G 121G 36% /scratch /dev/mapper/3600507630affd250004a 196G 68G 118G 37% /scratch /dev/mapper/3600507630affd250004a 196G 72G 115G 39% /scratch /dev/mapper/3600507630affd250004a 196G 75G 112G 41% /scratch /dev/mapper/3600507630affd250004a 196G 78G 109G 42% /scratch /dev/mapper/3600507630affd250004a 196G 81G 106G 44% /scratch /dev/mapper/3600507630affd250004a 196G 85G 102G 46% /scratch /dev/mapper/3600507630affd250004a 196G 87G 99G 47% /scratch /dev/mapper/3600507630affd250004a 196G 90G 96G 49% /scratch /dev/mapper/3600507630affd250004a 196G 94G 93G 51% /scratch /dev/mapper/3600507630affd250004a 196G 97G 90G 52% /scratch /dev/mapper/3600507630affd250004a 196G 100G 87G 54% /scratch /dev/mapper/3600507630affd250004a 196G 103G 84G 56% /scratch /dev/mapper/3600507630affd250004a 196G 106G 81G 57% /scratch /dev/mapper/3600507630affd250004a 196G 109G 78G 59% /scratch /dev/mapper/3600507630affd250004a 196G 112G 74G 61% /scratch /dev/mapper/3600507630affd250004a 196G 116G 71G 63% /scratch /dev/mapper/3600507630affd250004a 196G 119G 68G 64% /scratch /dev/mapper/3600507630affd250004a 196G 123G 64G 66% /scratch /dev/mapper/3600507630affd250004a 196G 126G 61G 68%