Bug#1065395: [Pkg-opencl-devel] Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space

2024-03-14 Thread Andreas Beckmann

On 14/03/2024 10.54, Paul Gevers wrote:
I just did. The biggest rise I saw (and I didn't even stop parallel 
runners) was ~5 GB, so this version seems fine.


Please let me know when the other versions are fixed too.


Thanks. I'll upgrade the other ones over the next days, closing the 
existing bug again with each upload.


Don't hesitate to yell if you notice something suspiciously straining on 
the CI ;-)



Andreas



Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space

2024-03-14 Thread Paul Gevers

Hi,

On 12-03-2024 10:18 a.m., Andreas Beckmann wrote:

On 06/03/2024 06.20, Paul Gevers wrote:

Unfortunately the test still takes upto 33 GB at least (see below).


Did you have time to test the -12 version, yet?


I just did. The biggest rise I saw (and I didn't even stop parallel 
runners) was ~5 GB, so this version seems fine.


Please let me know when the other versions are fixed too.

Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space

2024-03-12 Thread Andreas Beckmann

Hi Paul,

On 06/03/2024 06.20, Paul Gevers wrote:

Unfortunately the test still takes upto 33 GB at least (see below).


Did you have time to test the -12 version, yet?


Andreas



Bug#1065395: [Pkg-opencl-devel] Bug#1065395: Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space

2024-03-06 Thread Andreas Beckmann

On 06/03/2024 11.01, Paul Gevers wrote:

Hi,

On 06-03-2024 10:30 a.m., Andreas Beckmann wrote:

Do you have the log from running that autopkgtest?
I have no idea what's happening here. At least the buildd build only 
used 500 MB.


Attached.


Thanks. Actually, we were running the testsuite twice. Once by the build 
(that one was new), and once as regular autopkgtest (that has been done 
for some time already). So the issue should not have been really new, we 
just doubled the extreme disk space usage with the -10 upload ;-)


I'm disabling that autopkgtest on s390x now.


Andreas

PS: updating -15+ might be a bit delayed since I found a regression in 
the upstream branch yesterday ..




Bug#1065395: [Pkg-opencl-devel] Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space

2024-03-06 Thread Paul Gevers

Hi,

On 06-03-2024 10:30 a.m., Andreas Beckmann wrote:

Do you have the log from running that autopkgtest?
I have no idea what's happening here. At least the buildd build only 
used 500 MB.


Attached.

Paul



debug.log.xz
Description: application/xz


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1065395: [Pkg-opencl-devel] Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space

2024-03-06 Thread Andreas Beckmann

On 06/03/2024 06.20, Paul Gevers wrote:

Unfortunately the test still takes upto 33 GB at least (see below).


Do you have the log from running that autopkgtest?
I have no idea what's happening here. At least the buildd build only 
used 500 MB.


By the way, I just noticed this in the -14 log (judging from the name of 
the test I think that's intentional, but just checking (installing from 
the -16 package instead of the -14 one):
Get:2 http://deb.debian.org/debian unstable/main s390x spirv-headers all 
1.6.1+1.3.275.0-1 [118 kB]


spirv-headers is an independent package, the version is not 
corresponding to an llvm version (1.6.1 has nothing to do with llvm 16). 
But since it evolves independently, it sometimes renames bits (e.g. if 
internal vendor extensions get finalized and upstreamed), breaking its 
consumers (but the lvm_release_* branches of llvm-spirv-translator are 
usually quickly adjusted)


Andreas



Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space

2024-03-05 Thread Paul Gevers

Hi Andreas,

On 05-03-2024 10:16 a.m., Andreas Beckmann wrote:
But first I'd like to see the s390x build happen and your confirmation 
that this unbreaks the CI infrastructure. But at least ppc64 and sparc64 
built with 500MB instead of 40GB now ;-)

Feel free to block 15-17 temporarily, too.


Unfortunately the test still takes upto 33 GB at least (see below).

Paul
By the way, I just noticed this in the -14 log (judging from the name of 
the test I think that's intentional, but just checking (installing from 
the -16 package instead of the -14 one):
Get:2 http://deb.debian.org/debian unstable/main s390x spirv-headers all 
1.6.1+1.3.275.0-1 [118 kB]



root@ci-worker-s390x-01:~# while true ; do df -h /scratch/ | grep mapper 
; sleep 10 ; done
/dev/mapper/3600507630affd250004a  196G   27G  160G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   27G  160G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   27G  160G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   27G  160G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   27G  160G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   27G  160G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   27G  160G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   27G  159G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   28G  159G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   28G  159G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   28G  159G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   28G  159G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   29G  158G  16% 
/scratch
/dev/mapper/3600507630affd250004a  196G   29G  158G  16% 
/scratch
/dev/mapper/3600507630affd250004a  196G   29G  158G  16% 
/scratch
/dev/mapper/3600507630affd250004a  196G   29G  158G  16% 
/scratch
/dev/mapper/3600507630affd250004a  196G   29G  157G  16% 
/scratch
/dev/mapper/3600507630affd250004a  196G   29G  157G  16% 
/scratch
/dev/mapper/3600507630affd250004a  196G   30G  157G  16% 
/scratch
/dev/mapper/3600507630affd250004a  196G   30G  157G  16% 
/scratch
/dev/mapper/3600507630affd250004a  196G   28G  159G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   28G  159G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   28G  159G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   28G  159G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   29G  158G  16% 
/scratch
/dev/mapper/3600507630affd250004a  196G   34G  153G  19% 
/scratch
/dev/mapper/3600507630affd250004a  196G   44G  143G  24% 
/scratch
/dev/mapper/3600507630affd250004a  196G   53G  133G  29% 
/scratch
/dev/mapper/3600507630affd250004a  196G   62G  124G  34% 
/scratch
/dev/mapper/3600507630affd250004a  196G   70G  117G  38% 
/scratch
/dev/mapper/3600507630affd250004a  196G   28G  159G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   28G  159G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   28G  159G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   29G  158G  16% 
/scratch
/dev/mapper/3600507630affd250004a  196G   27G  160G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   27G  160G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   27G  160G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   27G  160G  15% 
/scratch
/dev/mapper/3600507630affd250004a  196G   27G  160G  15% 
/scratch


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space

2024-03-05 Thread Paul Gevers

Hi

On 05-03-2024 10:16 a.m., Andreas Beckmann wrote:

Feel free to block 15-17 temporarily, too.


I already did that ;).

I'll try when I see 14 in the archive on s390x.

Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space

2024-03-05 Thread Andreas Beckmann

On 05/03/2024 07.45, Paul Gevers wrote:
In the upstream report you mention it's the same across all versions and 
yesterday we had the same problem with -15. Will you fix the other 
versions too? Do you want me to clone this bug for that?


I'll update all 4 branches over the next days, no need for extra bugs ;-)
But first I'd like to see the s390x build happen and your confirmation 
that this unbreaks the CI infrastructure. But at least ppc64 and sparc64 
built with 500MB instead of 40GB now ;-)

Feel free to block 15-17 temporarily, too.

Andreas



Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space

2024-03-04 Thread Paul Gevers

Hi Andreas,

Thanks for the upload.

On 04-03-2024 12:26 p.m., Andreas Beckmann wrote:
Control: forwarded -1 
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/2397


On 03/03/2024 20.52, Paul Gevers wrote:

Source: spirv-llvm-translator-14
Version: 14.0.0-10


In the upstream report you mention it's the same across all versions and 
yesterday we had the same problem with -15. Will you fix the other 
versions too? Do you want me to clone this bug for that?


Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space

2024-03-04 Thread Andreas Beckmann
Control: forwarded -1 
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/2397


On 03/03/2024 20.52, Paul Gevers wrote:

Source: spirv-llvm-translator-14
Version: 14.0.0-10



Since a couple of days, our workers on s390x are dying because some test
is filling up all disk space. Several days ago, I wrongly suspected 

One of the suspects started to be spirv-llvm-translator-14, so I ran its 
autopkgtest manually, while logging disk use every 10 seconds (I started 
slightly delayed because I monitored the wrong partition first). As you 
can see below, during the test it grows from 17 GB (at the end) to its 
peak at 179 GB. That's not acceptable on our infrastructure. One file I 
happened to spot on the way was 
build/test/test_output/DebugInfo/Generic/Output/two-cus-from-same-file.ll.tmp:

-rw-r--r-- 1 root root  41G Mar  3 19:18 two-cus-from-same-file.ll.tmp

I have added spirv-llvm-translator-14 to our reject-list on s390x.

As this seems to be a rather new issue, I'm wondering if it's due to:
* Add build-needed autopkgtest for spirv-headers compat check.


Probably.

The buildds report disk usage when building spirv-llvm-translator-* 
between 400MB and 600MB on all architectures except s390x, ppc64, 
sparc64, i.e. all the big-endian ones, where it's slightly above 40GB 
(which very vell corresponds to the file you spotted).
This started with 14.0.0-2 (i.e. 14.0.0-1 was around 500MB on s390x, 
too) which had "* Enable build-time tests, ignore failures on !amd64."


So maybe I should skip the build-time tests on big-endian altogether.

Failure rates:
amd64: 0%
i386: <1%
ppc64el: <2%
most: <10%
s390x: >60%
ppc64: >60%

(Upstream seems to test the testsuite only on amd64, 
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/1964)


Andreas



Bug#1065395: spirv-llvm-translator-14: autopkgtest on s390x uses huge amount of disk space

2024-03-03 Thread Paul Gevers

Source: spirv-llvm-translator-14
Version: 14.0.0-10
Severity: serious
X-Debbugs-CC: debian...@lists.debian.org
User: debian...@lists.debian.org
Usertags: issue

Dear maintainers,

Since a couple of days, our workers on s390x are dying because some test
is filling up all disk space. Several days ago, I wrongly suspected 
src:fenics-dolfinx (bug #1064995) and added it to our reject-list. It 
didn't solve the issue, so today I spend more time on finding the 
culprit. Basically every spike above 40% in the graph [1] is a moment 
that we see issues like:


Feb 28 05:38:18 ci-worker-s390x-01 debci[1738391]: gzip:
/tmp/debci-worker-43383540-cNnbLE372K/autopkgtest-incoming/testing/s390x/f/fenics-dolfinx/43383540/log.gz: 


No space left on device
Feb 28 05:38:18 ci-worker-s390x-01 debci[1424101]: E: Test for package
fenics-dolfinx produced no exit code, aborting

One of the suspects started to be spirv-llvm-translator-14, so I ran its 
autopkgtest manually, while logging disk use every 10 seconds (I started 
slightly delayed because I monitored the wrong partition first). As you 
can see below, during the test it grows from 17 GB (at the end) to its 
peak at 179 GB. That's not acceptable on our infrastructure. One file I 
happened to spot on the way was 
build/test/test_output/DebugInfo/Generic/Output/two-cus-from-same-file.ll.tmp:

-rw-r--r-- 1 root root  41G Mar  3 19:18 two-cus-from-same-file.ll.tmp

I have added spirv-llvm-translator-14 to our reject-list on s390x.

As this seems to be a rather new issue, I'm wondering if it's due to:
* Add build-needed autopkgtest for spirv-headers compat check.

Or maybe something in the toolchain that broke on s390x?

Paul

[1]
https://ci.debian.net/munin/ci-worker-s390x-01/ci-worker-s390x-01/df.html

/dev/mapper/3600507630affd250004a  196G   40G  146G  22% 
/scratch
/dev/mapper/3600507630affd250004a  196G   49G  138G  27% 
/scratch
/dev/mapper/3600507630affd250004a  196G   57G  130G  31% 
/scratch
/dev/mapper/3600507630affd250004a  196G   65G  122G  35% 
/scratch
/dev/mapper/3600507630affd250004a  196G   66G  121G  36% 
/scratch
/dev/mapper/3600507630affd250004a  196G   67G  120G  36% 
/scratch
/dev/mapper/3600507630affd250004a  196G   70G  117G  38% 
/scratch
/dev/mapper/3600507630affd250004a  196G   73G  114G  40% 
/scratch
/dev/mapper/3600507630affd250004a  196G   76G  111G  41% 
/scratch
/dev/mapper/3600507630affd250004a  196G   79G  108G  43% 
/scratch
/dev/mapper/3600507630affd250004a  196G   83G  104G  45% 
/scratch
/dev/mapper/3600507630affd250004a  196G   85G  101G  46% 
/scratch
/dev/mapper/3600507630affd250004a  196G   88G   98G  48% 
/scratch
/dev/mapper/3600507630affd250004a  196G   92G   95G  50% 
/scratch
/dev/mapper/3600507630affd250004a  196G   95G   92G  51% 
/scratch
/dev/mapper/3600507630affd250004a  196G   98G   89G  53% 
/scratch
/dev/mapper/3600507630affd250004a  196G  101G   86G  54% 
/scratch
/dev/mapper/3600507630affd250004a  196G  104G   83G  56% 
/scratch
/dev/mapper/3600507630affd250004a  196G  107G   80G  58% 
/scratch
/dev/mapper/3600507630affd250004a  196G   65G  122G  35% 
/scratch
/dev/mapper/3600507630affd250004a  196G   65G  122G  35% 
/scratch
/dev/mapper/3600507630affd250004a  196G   66G  121G  36% 
/scratch
/dev/mapper/3600507630affd250004a  196G   68G  118G  37% 
/scratch
/dev/mapper/3600507630affd250004a  196G   72G  115G  39% 
/scratch
/dev/mapper/3600507630affd250004a  196G   75G  112G  41% 
/scratch
/dev/mapper/3600507630affd250004a  196G   78G  109G  42% 
/scratch
/dev/mapper/3600507630affd250004a  196G   81G  106G  44% 
/scratch
/dev/mapper/3600507630affd250004a  196G   85G  102G  46% 
/scratch
/dev/mapper/3600507630affd250004a  196G   87G   99G  47% 
/scratch
/dev/mapper/3600507630affd250004a  196G   90G   96G  49% 
/scratch
/dev/mapper/3600507630affd250004a  196G   94G   93G  51% 
/scratch
/dev/mapper/3600507630affd250004a  196G   97G   90G  52% 
/scratch
/dev/mapper/3600507630affd250004a  196G  100G   87G  54% 
/scratch
/dev/mapper/3600507630affd250004a  196G  103G   84G  56% 
/scratch
/dev/mapper/3600507630affd250004a  196G  106G   81G  57% 
/scratch
/dev/mapper/3600507630affd250004a  196G  109G   78G  59% 
/scratch
/dev/mapper/3600507630affd250004a  196G  112G   74G  61% 
/scratch
/dev/mapper/3600507630affd250004a  196G  116G   71G  63% 
/scratch
/dev/mapper/3600507630affd250004a  196G  119G   68G  64% 
/scratch
/dev/mapper/3600507630affd250004a  196G  123G   64G  66% 
/scratch
/dev/mapper/3600507630affd250004a  196G  126G   61G  68%